hcatalog.pig.HCatStorer doesn't support appending Hive table, overwriting non-partitioned tables, parquet format tables, external table with partitions.
1. Immutale in HCatalog
HCatalog currently treats all tables as "immutable" - i.e. all tables and partitions can be written to only once, and not appended. The nuances of what this means is as follows:
- A non-partitioned table can be written to, and data in it is never updated from then on unless you drop and recreate.
- A partitioned table may support "appending" of a sort in a manner by adding new partitions to the table, but once written, the partitions themselves cannot have any new data added to them.
Hive, on the other hand, does allow us to "INSERT INTO" into a table, thus allowing us append semantics. There is benefit to both of these models, and so, our goal is as follows:
2. Store Examples
You can overwrite a non-partitioned table simply by using HCatStorer. The contents of the table will be overwritten:
store z into 'web_data' using org.apache.hcatalog.pig.HCatStorer();
But the above statement may cause the below error. That means the non-partitioned table(external table) can't be overwritten.
ERROR: org.apache.hcatalog.common.HCatException : 2003 : Non-partitioned table already contains data : xxx_table
But the same statement can overwrite the partitioned table.
To add one new partition to a partitioned table, specify the partition value in the store function. Pay careful attention to the quoting, as the whole string must be single quoted and separated with an equals sign:
ERROR: org.apache.hcatalog.common.HCatException : 2003 : Non-partitioned table already contains data : xxx_table
But the same statement can overwrite the partitioned table.
To add one new partition to a partitioned table, specify the partition value in the store function. Pay careful attention to the quoting, as the whole string must be single quoted and separated with an equals sign:
store z into 'web_data' using org.apache.hcatalog.pig.HCatStorer('datestamp=20110924');
To write into multiple partitions at once, make sure that the partition column is present in your data, then call HCatStorer with no argument:
store z into 'web_data' using org.apache.hcatalog.pig.HCatStorer(); -- datestamp must be a field in the relation z
3. External Table
Starting in HCatalog 0.5, dynamic partitioning on external tables was broken (HCATALOG-500). This issue was fixed in Hive 0.12.0 by creating dynamic partitions of external tables in locations based on metadata rather than user specifications (HIVE-5011).
ERROR 2997: Unable to recreate exception from backed error: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /user/hive/warehouse/silver_dollar_tmp.db/device_state_tmp/_DYN0.9997542998165297/year=__HIVE_DEFAULT_PARTITION__/month=__HIVE_DEFAULT_PARTITION__/_temporary/_attempt_201407190658_0067_m_000000_1/part-m-00000: File is not open for writing. Holder DFSClient_NONMAPREDUCE_1915661429_1 does not have any open files.
Reference:
https://cwiki.apache.org/confluence/display/Hive/HCatalog+LoadStore#HCatalogLoadStore-HCatStorer
https://issues.apache.org/jira/browse/HIVE-6405
https://cwiki.apache.org/confluence/display/Hive/HCatalog+DynamicPartitions#HCatalogDynamicPartitions-ExternalTables
https://issues.apache.org/jira/browse/HCATALOG-551
https://issues.apache.org/jira/browse/HIVE-6897
No comments:
Post a Comment