Thursday 12 June 2014

Error when Parse XML using Pig UDF

After adding a UDF to parse XML in your project, you may encounter the below error when running pig.

ERROR [main] conf.Configuration(1151): Failed to set setXIncludeAware(true) for parser org.apache.xerces.jaxp.DocumentBuilderFactoryImpl@1787038:java.lang.UnsupportedOperationException:


The reason is that the JDK supplied XML libraries are a bit out of date. In order to get rid of this error, you’ll need to both provide recent versions of Xalan and Xerces with you job configuration, which means you’ll need to make them available in your classpath.
Option 1:
If you’re using maven, it’s just a couple of lines to include in the pom file.
1
2
3
4
5
6
7
8
9
10
<dependency>
    <groupId>xerces</groupId>
    <artifactId>xercesImpl</artifactId>
    <version>2.9.1</version>
</dependency>
<dependency>
    <groupId>xalan</groupId>
    <artifactId>xalan</artifactId>
    <version>2.7.1</version>
</dependency>
Then, in code, add this line.
System.setProperty("javax.xml.parsers.DocumentBuilderFactory",
  "com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl");

Option 2:  Delete "xercesImpl.jar" and "xalan.jar" from class path.
For example, /user/lib/pig/lib/

References:
1.http://caffeinbean.wordpress.com/2011/03/01/hadoop-failed-to-set-setxincludeawaretrue-for-parser-error-and-how-to-resolve-it/

No comments:

Post a Comment