arundhaj

all that is technology

Read and Write DataFrame from Database using PySpark

 

To load a DataFrame from a MySQL table in PySpark

source_df = sqlContext.read.format('jdbc').options(
          url='jdbc:mysql://localhost/database_name',
          driver='com.mysql.jdbc.Driver',
          dbtable='SourceTableName',
          user='your_user_name',
          password='your_password').load()

And to write a DataFrame to a MySQL table

destination_df.write.format('jdbc').options(
          url='jdbc:mysql://localhost/database_name',
          driver='com.mysql.jdbc.Driver',
          dbtable='DestinationTableName',
          user='your_user_name',
          password='your_password').mode('append').save()

While submitting the spark program, use the following command

bin/spark-submit --jars external/mysql-connector-java-5.1.40-bin.jar
      /path_to_your_program/spark_database.py

Hope this helps!

Comments