regression towards the datascience

Pig script to process CSV file with quotes and multiline


While writing Pig script, usually we use PigStorage for loading a CSV file.

Consider a sample CSV file in the following format.

2,Loading successfull,2014-09-25
3,Loading successfull,2014-09-25
4,Loading successfull,2014-09-25

can be loaded as

logs = LOAD 'log_folder/log_file.csv' USING PigStorage(',') AS (id: long, message: chararray, timestamp: chararray);

However, I had a CSV file containing double quotes and also a single record spanning multiple lines. In the format shown below.


this sort of CSV file can be loaded using

logs = LOAD 'log_folder/log_file.csv' USING',', 'YES_MULTILINE') AS (id: long, message: chararray, timestamp: chararray);

Hope this helps.