Thursday, 8 January 2015

Recover Partitions in Hive Metastore

Hive stores a list of partitions for each table in its metastore. If, however, new partitions are directly added to HDFS (manually by hadoop fs -put command), the metastore will not be aware of these partitions. 
you need to add partition for every partition
ALTER TABLE <tablename> ADD PARTITION

Or in short you can run

MSCK REPAIR TABLE <tablename>; available since Hive 0.11
It will add any partitions that exist on HDFS but not in metastore to the metastore.

  • Copy the partition folders and data to a table folder.
  • Create a table with dynamic partitions on the table folder.
  • Run MCSK Repair on the new table.

For example,

CREATE TABLE IF NOT EXISTS new_table (
  host STRING,
  agent STRING
)
PARTITIONED BY (
  year INT,
  month INT
)
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
STORED AS
INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"
LOCATION '/user/me/path/table_folder';

MSCK REPAIR TABLE new_table;

MSCK REPAIR is


Reference:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)
http://www-01.ibm.com/support/knowledgecenter/#!/SSPT3X_3.0.0/
com.ibm.swg.im.infosphere.biginsights.commsql.doc/doc/biga_msckrep.html
http://stackoverflow.com/questions/26753602/hive-0-13-msck-repair-table-only-lists-partitions-not-in-metastore

No comments:

Post a Comment