You can move all the csv files into another HDFS directory and create a Hive table on top of that. If it works better for you, you can create a subdirectory (say, csv) within your present directory that houses all CSV files. You can then create a Hive table on top of this subdirectory. Keep in mind that any Hive tables created on top of the parent directory will NOT contain the data from the subdirectory.
Option 2:
You can create an external table, then add subfolders as partitions.
CREATE EXTERNAL TABLE test (id BIGINT) PARTITIONED BY ( yymmdd STRING);
ALTER TABLE test ADD PARTITION (yymmdd = '20120921') LOCATION 'loc1';
ALTER TABLE test ADD PARTITION (yymmdd = '20120922') LOCATION 'loc2'
#!/bin/bash
hive -e "CREATE EXTERNAL TABLE users (id int, name string) PARTITIONED BY (month string) STORED AS TEXTFILE LOCATION '/testdata/user/'; "
hscript=""
for part in `hadoop fs -ls /testdata/user/ | grep -v -P "^Found"|grep -o -P "[a-zA-Z]{3}$"`;
do
echo $part
tmp="ALTER TABLE users ADD PARTITION(month='$part');"
hscript=$hscript$tmp
done;
hive -e "$hscript"
Reference:
http://stackoverflow.com/questions/9039414/hive-table-creation-w-multi-files-w-multiple-directories
No comments:
Post a Comment