Thursday 4 September 2014
Load Avro Files into Hive Table
1. Retrieve schema from Avro
$java -jar avro-tools-1.7.5.jar getschema object.snappy.avro > object.avsc
2. Upload avrò and avsc to HDFS
3. All avrò files under the same folder must have the same schema. .avsc schema is needed.
CREATE EXTERNAL TABLE avrotable
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/user/xxx/AvroFiles/'
TBLPROPERTIES ( 'avro.schema.url'='hdfs:///user/xxx/xxx.avsc')
Reference:
http://www.michael-noll.com/blog/2013/07/04/using-avro-in-mapreduce-jobs-with-hadoop-pig-hive/
https://gist.github.com/MicTech/12f1950ee174ac1095ad
https://cwiki.apache.org/confluence/display/Hive/AvroSerDe
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment