Alvin's Big Data Notebook : Hive Table Load files from multi directories

Wednesday, 9 July 2014

Hive Table Load files from multi directories

Option 1:
You can move all the csv files into another HDFS directory and create a Hive table on top of that. If it works better for you, you can create a subdirectory (say, csv) within your present directory that houses all CSV files. You can then create a Hive table on top of this subdirectory. Keep in mind that any Hive tables created on top of the parent directory will NOT contain the data from the subdirectory.

Option 2:

You can create an external table, then add subfolders as partitions.

CREATE EXTERNAL TABLE test (id BIGINT) PARTITIONED BY ( yymmdd STRING);

ALTER TABLE test ADD PARTITION (yymmdd = '20120921') LOCATION 'loc1';

ALTER TABLE test ADD PARTITION (yymmdd = '20120922') LOCATION 'loc2'

#!/bin/bash
hive -e "CREATE EXTERNAL TABLE users (id int, name string) PARTITIONED BY (month string) STORED AS TEXTFILE LOCATION '/testdata/user/'; "

hscript=""

for part in `hadoop fs -ls /testdata/user/ | grep -v -P "^Found"|grep -o -P "[a-zA-Z]{3}$"`;
do

echo $part
tmp="ALTER TABLE users ADD PARTITION(month='$part');"
hscript=$hscript$tmp
done;

hive -e "$hscript"

Reference:

http://stackoverflow.com/questions/9039414/hive-table-creation-w-multi-files-w-multiple-directories

Alvin's Big Data Notebook

Wednesday, 9 July 2014

Hive Table Load files from multi directories

No comments:

Post a Comment