Wednesday 24 September 2014

NULL Partition Columns Cause Errors in HCatalog in CDH5

When we used HCatalog in Pig to write into a partitioned Hive table in CDH 5, it shows the following Error.

2014-09-22 11:30:37,923 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.NullPointerException at org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:168)

When we check the source code of FileRecordWriterContainer.java,
  1. if (dynamicPartitioningUsed) {
  2. // calculate which writer to use from the remaining values - this needs to be done before we delete cols
  3. List<String> dynamicPartValues = new ArrayList<String>();
  4. for (Integer colToAppend : dynamicPartCols) {
  5. dynamicPartValues.add(value.get(colToAppend).toString());
  6. }

It shows the error is caused by the null value of partitioned columns.

However, the same issue didn't happen in CDH4.7.
Hive does not allow empty strings as partition keys, and it returns a string value such as __HIVE_DEFAULT_PARTITION__ instead of NULL when such values are returned from a query. 

So, the reason might be either Pig treats "Null" as null or Hive doesn't support NULL partition.


Reference:
https://apache.googlesource.com/hcatalog/+/branch-0.4/src/java/org/apache/hcatalog/mapreduce/FileRecordWriterContainer.java

No comments:

Post a Comment