Tuesday, 8 July 2014

Create Hive Table Load Data from HDFS and Parquet

1.
CREATE EXTERNAL TABLE device (
pid bigint, device_type string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/cloudera/xxx';



2. Using Parquet in Hive in CDH4.3
add jar lib/parquet-common-1.3.1.jar;
add jar lib/parquet-encoding-1.3.1.jar;
add jar lib/parquet-hadoop-1.3.1.jar;
add jar lib/parquet-hive-1.0.0.jar;
add jar lib/parquet-pig-1.3.1.jar;
add jar lib/parquet-format-2.0.0.jar;


CREATE EXTERNAL TABLE device (
pid bigint, device_type string
)
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
STORED AS
INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"
LOCATION '/user/xxx/device';

3.
Hive 0.13 and CDH 5 can easily create Parquet tables in Hive:
CREATE TABLE parquet_test (
 id int,
 str string,
 mp MAP<STRING,STRING>,
 lst ARRAY<STRING>,
 struct STRUCT<A:STRING,B:STRING>)
PARTITIONED BY (part string)
STORED AS PARQUET;

Reference:

http://cmenguy.github.io/blog/2013/10/30/using-hive-with-parquet-format-in-cdh-4-dot-3/
http://blog.cloudera.com/blog/2014/02/native-parquet-support-comes-to-apache-hive/

No comments:

Post a Comment