While writing Pig script, usually we use PigStorage
for loading a CSV file.
Consider a sample CSV file in the following format.
2,Loading successfull,2014-09-25
3,Loading successfull,2014-09-25
4,Loading successfull,2014-09-25
can be loaded as
logs = LOAD 'log_folder/log_file.csv' USING PigStorage(',') AS (id: long, message: chararray, timestamp: chararray);
However, I had a CSV file containing double quotes and also a single record spanning multiple lines. In the format shown below.
"2","Loading
successfull","2014-09-25"
"3","Loading
successfull","2014-09-25"
"4","Loading
successfull","2014-09-25"
this sort of CSV file can be loaded using org.apache.pig.piggybank.storage.CSVExcelStorage
logs = LOAD 'log_folder/log_file.csv' USING org.apache.pig.piggybank.storage.CSVExcelStorage(',', 'YES_MULTILINE') AS (id: long, message: chararray, timestamp: chararray);
Hope this helps.