Sunday 31 August 2014

Update Stale Records in Hive Table

Hive is not designed for online transaction processing and does not offer
real-time queries and row level updates. It is best used for batch jobs over
large sets of immutable data (like web logs).

There is no operation in Hive supported for deletion or update of a particular record or particular set of records. However, two solutions can achieve this in a partitioned hive table.

Solution 1: Regular Tables.

Select updated records from one stage table to overwrite a partition of the target table to achieve updates.

 INSERT OVERWRITE TABLE tabletop PARTITION(partname = 'valueIwantToSet')
 SELECT things FROM databases.tables WHERE whereclause;


Solution 2: External Tables

Declare a partitioned external table.
Replace the stale partition with the updated data in hdfs folder to achieve updates.


Reference:
http://stackoverflow.com/questions/6200871/update-set-option-in-hive
http://blog.zhengdong.me/2012/02/22/hive-external-table-with-partitions
http://stackoverflow.com/questions/17810537/how-to-delete-and-update-a-record-in-hive?rq=1

No comments:

Post a Comment