arundhaj

regression towards the datascience

Import postgres table to HDFS using sqoop

 

Importing data from postgres tables into HDFS using sqoop could be done with the following steps.

Make sure postgres jdbc connector is available in /usr/share/java directory.

To list all available tables in the postgres database

$ sqoop list-tables \
    --connect jdbc:postgresql://hostname.com/databaseName \
    --username myUserName \
    --password myPassword

To import table to HDFS

$ sqoop import \
    --connect jdbc:postgresql://hostname.com/databaseName \
    --username myUserName \
    --password myPassword \
    --table myTableName
    --columns "col_1,col_2"
    -m 1 \
    --target-dir /user/hue/myTableName \
    --enclosed-by '\"'

After successfull completion of the above command, the imported file should be available in the target-dir /user/hue/myTableName/part-m-00000

This file can be used to create HCatalog table for querying with Hive or Pig.

I'm using Hortonworks HDPv2.1.1 in AWS EC2.

This is a simple usecase of using sqoop. Hope this helps.

Comments