Tuesday, 27 January 2015

Hive Load Json files

Serialization/Deserialization module for Apache Hadoop Hive
  • Read data stored in JSON format
  • Convert data to JSON format when INSERT INTO table
  • arrays and maps are supported
  • nested data structures are also supported.
For example,

add jar ../lib/json-serde-1.0-SNAPSHOT-jar-with-dependencies.jar;
CREATE EXTERNAL TABLE app_store (
    id string,
    app_name string,
    version string,
    bundle_id string,
    genres array)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS TEXTFILE
LOCATION '/user/abc/app_store/';



CREATE TABLE json_nested_test (
    country string,
    languages array,
    religions map>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS TEXTFILE;


Reference:
https://github.com/AlvinCJin/Hive-JSON-Serde
http://www.congiu.net/hive-json-serde/1.3/cdh5/

No comments:

Post a Comment