Thursday 4 September 2014

Load Avro Files into Hive Table


1. Retrieve schema from Avro
$java -jar avro-tools-1.7.5.jar getschema object.snappy.avro > object.avsc

2. Upload avrò and avsc to HDFS


3. All avrò files under the same folder must have the same schema.  .avsc schema is needed.

    CREATE EXTERNAL TABLE avrotable
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
    STORED AS
    INPUTFORMAT  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
    LOCATION '/user/xxx/AvroFiles/'
    TBLPROPERTIES ( 'avro.schema.url'='hdfs:///user/xxx/xxx.avsc')



Reference:

http://www.michael-noll.com/blog/2013/07/04/using-avro-in-mapreduce-jobs-with-hadoop-pig-hive/
https://gist.github.com/MicTech/12f1950ee174ac1095ad
https://cwiki.apache.org/confluence/display/Hive/AvroSerDe

No comments:

Post a Comment