Friday, 12 September 2014

Process Avro File with Pig

Must use the compatible piggybank.jar with CDH, for example.

find /opt/cloudera/parcels/ -name *piggybank*.jar
/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/pig/piggybank.jar

Otherwise, the job fails with Error:

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
Caused by: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.apache.pig.piggybank.storage.avro.PigAvroInputFormat.listStatus(PigAvroInputFormat.java:)

Note: 
1. .avsc schema is not necessary. If added, please also upload it in hdfs.
2. The output file is in binary format, probably because avro is in binary.

Sample code:

REGISTER lib/avro-1.7.3.jar
REGISTER lib/json-simple-1.1.1.jar
REGISTER lib/piggybank.jar
REGISTER lib/jackson-core-asl-1.8.5.jar
REGISTER lib/jackson-mapper-asl-1.8.5.jar
REGISTER lib/snappy-java-1.1.1.jar

--delete the output folder
rmf output
avro = LOAD 'xxxx.avro' USING org.apache.pig.piggybank.storage.avro.AvroStorage
('no_schema_check',  'schema_file', 'xxxx.avsc');
STORE avro INTO 'output' USING org.apache.pig.piggybank.storage.CSVExcelStorage(',');



No ";" or quotations required since it is a shell command.

Reference:
http://www.michael-noll.com/blog/2013/07/04/using-avro-in-mapreduce-jobs-with-hadoop-pig-hive/

31 comments:

  1. Thanks for giving a very good explanation on Pig using hadoop.

    Hadoop Training in Chennai

    ReplyDelete
  2. Appreciating the persistence you put into your blog and detailed information you provide.
    Best Hadoop Training Institute In chennai

    amazon-web-services-training-institute-in-chennai

    ReplyDelete
  3. Needed to compose you a very little word to thank you yet again regarding the nice suggestions you’ve contributed here.

    Hadoop training in chennai

    ReplyDelete
  4. Appreciation for really being thoughtful and also for deciding on certain marvelous guides most people really want to be aware of.
    Best selenium training Institute in chennai

    ReplyDelete
  5. Very interesting to read your blog. It make viewer to keep updated.

    Java Training in Chennai | Java Training Institute in Chennai

    ReplyDelete
  6. very nice blog.. it contains many informations about Big data.. keep sharing this.. dotnet training in chennai

    ReplyDelete
  7. I am definitely enjoying your website. You definitely have some great insight and great stories.
    Samsung Tv service in chennai |
    Tv repair and service in chennai

    ReplyDelete
  8. I have read your blog its very attractive and impressive. I like it your blog.
    Bouncer security service in chennai |
    security service company in chennai

    ReplyDelete
  9. Such a Great Article!! I learned something new from your blog. Amazing stuff. I would like to follow your blog frequently. Keep Rocking!!
    Trolley Manufacture In Chennai |

    Display rack in Chennai |

    Pharmacy rack in Chennai

    ReplyDelete
  10. Excellent!! You provided very useful information in this article. I have read many articles in various sites but this article is giving in depth explanation....
    school diaries printing services in chennai |

    business cards printing press in chennai

    ReplyDelete
  11. read your blog and i got a very useful and knowledgeable information from your blog.You have done a great job .Please visit our page Devops Training in Chennai
    Devops Training in Chennai | Devops Training Institute in Chennai

    ReplyDelete
  12. Great content thanks for sharing this informative blog which provided me technical information keep posting.
    Devops Training in Chennai | Devops Training Institute in Chennai

    ReplyDelete