Tuesday, 22 July 2014

HCatalog Load/Store Hive Tables in Pig Action of Oozie

To use HCatalog in Pig Action to load/store Hive tables.

Without adding <argument>-useHCatalog</argument> in Pig action.

Add all jars from /usr/lib/hive/lib and
/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hcatalog/share/hcatalog/*.jar
to $AppPath/lib

1.[main] ERROR org.apache.pig.PigServer  - exception during parsing: Error during parsing. Cannot get schema from loadFunc org.apache.hcatalog.pig.HCatLoader
  Failed to parse: Can not retrieve schema from loader org.apache.hcatalog.pig.HCatLoader@13ac518a

Solution:
Copy /etc/hive/conf/hive-site.xml to workflow/lib/
We have to specify hive-site.xml path for Pig action in workflow.xml, such as 
<file>lib/hive-site.xml</file>
Can't change the name of hive-site.xml in pig action.

In CDH5.1.2

2. [main] ERROR org.apache.pig.PigServer  - exception during parsing: Error during parsing. Could not resolve org.apache.hcatalog.pig.HCatLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]

Solution:
Include the "hcatalog" sharelib by adding below to the pig action
<property>
     <name>oozie.action.sharelib.for.pig</name>
     <value>pig,hcatalog</value>
</property>

3. [main] ERROR org.apache.pig.PigServer  - exception during parsing: Error during parsing. Pig script failed to parse:
  <file aggregate_phase.pig, line 31, column 6> pig script failed to validate: java.lang.RuntimeException: could not instantiate 'org.apache.hcatalog.pig.HCatLoader' with arguments 'null'

This is due to without register/include hive-metastore.jar or libthrift.jar. 
The solution is with solution 2.

A working example:

<action name="pig">
        <pig>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
        <configuration>
                <property>
                  <name>mapred.job.queue.name</name>
                  <value>${queueName}</value>
                </property>
               <property>
                     <name>oozie.action.sharelib.for.pig</name>
                    <value>pig,hcatalog</value>
              </property>
         </configuration>
<script>${pig_script}</script>
<file>conf/hive-site.xml</file>
        </pig>
        <ok to="end"/>
        <error to="kill"/>
    </action>

Reference:
1. http://danieladeniji.wordpress.com/2013/05/24/technical-hadoopcloudera-cdhhive-v2-installation/

2. http://www.tanzirmusabbir.com/2013/03/oozie-example-hive-actions.html(Good)

3. http://mail-archives.apache.org/mod_mbox/oozie-user/201406.mbox/%3CCFBB48BC.822D1%25chitnis@yahoo-inc.com%3E

4. https://github.com/apache/oozie/tree/master/examples/src/main/apps/hcatalog

5. http://blog.cloudera.com/blog/2012/12/how-to-use-the-sharelib-in-apache-oozie/

6. http://mail-archives.apache.org/mod_mbox/oozie-user/201406.mbox/%3CCFC76434.83265%25chitnis@yahoo-inc.com%3E
7. https://issues.apache.org/jira/browse/HCATALOG-137

2 comments: