Friday, 18 July 2014

Use HCatalog to store into Hive Table in Pig

To Run it in CDH4.7 Cluster:
*  export HCAT_HOME=/opt/cloudera/parcels/CDH/lib/(this setting is not needed in Oozie action)
*  pig -useHCatalog xxx.pig

REGISTER /opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hcatalog/share/hcatalog/hcatalog-core-0.5.0-cdh4.7.0.jar
REGISTER /opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hcatalog/share/hcatalog/storage-handlers/hbase/lib/hbase-storage-handler-0.1.0.jar
REGISTER /opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hcatalog/share/hcatalog/hcatalog-pig-adapter-0.5.0-cdh4.7.0.jar

Note: In CDH5.1, hive 0.12, No not need register above three jars, they must be deleted from the lib folder.
Otherwise, library conflicts each other.
FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.plan.TableDesc.<init>(Ljava/lang/Class;Ljava/lang/Class;Ljava/lang/Class;Ljava/util/Properties;)V

at org.apache.hcatalog.common.HCatUtil.configureOutputStorageHandler(HCatUtil.java:481)

Can't write into Hive table with parquet format, such as:

CREATE TABLE device (pid bigint, track string)
PARTITIONED BY(year int, month int)
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
STORED AS
INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat";


No comments:

Post a Comment