Wednesday, 6 August 2014

Hive Metastore Issue after Changing Namenode HA Name

Unable to run Hive or pig scripts(with HCatalog) after changing namenode HA name, because of Metastore issue.

After change the HA name from hdfs://nameservice1 to hdfs://testname,
when we tried to query the existing tables in Hive or access them through pig by HCatalog,
but we will have the same error.

"FAILED: IllegalArgumentException java.net.UnknownHostException: nameservice1"

However, No error for the tables created after the name change.

Solution:



1. We attempted to run the /opt/cloudera/parcels/CDH/lib/hive/bin/metatool -listFSRoot to show the old nameservice names in the metastore, but the command wasn't returning any results.

There may be issues with running the metatool command line utility.  To properly run the -listFSRoot command, the following needs to be run:

2. If using an external Database, the AUX_CLASSPATH env variable must be specified.  This can be retrieved from the stderr.log of the running hive metastore instance (under /var/run/cloudera-scm-agent/process/nnn-hive-HIVEMETASTORE/logs), and should look like:

export AUX_CLASSPATH=/usr/share/cmf/lib/plugins/event-publish-4.8.2-shaded.jar:/usr/share/cmf/lib/plugins/tt-instrumentation-4.8.2.jar:/usr/share/cmf/lib/plugins/navigator-plugin-4.8.2-shaded.jar:/usr/share/java/mysql-connector-java.jar:/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar:/usr/share/java/oracle-connector-java.jar

Using that same config directory, running the listFSRoot command is as follows:
/opt/cloudera/parcels/CDH/lib/hive/bin/hive --config /var/run/cloudera-scm-agent/process/777-hive-metastore-updatelocation --service metatool -listFSRoot

3. We performed a backup of the metastore db with the pg_dump command, and then stopped the hive service, and ran "Update hive Metastore Namenodes" from the action menu, started the service and verified that the metastore was working correctly.

Lesson:
the metatool command will only run outside of CM, if the following env variables need to be set:

HIVE_CONF_DIR - needed to point to the proper hive-site.xml configuration, the client configuration in /etc/hive/conf will not work
AUX_CLASSPATH - needed for external DB jars (postgres, mysql, etc)

Similar what we saw in our testing, if you set the env variables to what is shown in the logs for the hive-METASTORE service, the metatool command will work correctly:

As the root user on the node where hive is running:

export AUX_CLASSPATH=/usr/share/cmf/lib/plugins/event-publish-4.8.2-shaded.jar:/usr/share/cmf/lib/plugins/tt-instrumentation-4.8.2.jar:/usr/share/cmf/lib/plugins/navigator-plugin-4.8.2-shaded.jar:/usr/share/java/mysql-connector-java.jar:/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar:/usr/share/java/oracle-connector-java.jar
export HIVE_CONF_DIR=$(ls -1trd /var/run/cloudera-scm-agent/process/*hive-HIVEMETASTORE |tail -1)
/opt/cloudera/parcels/CDH/lib/hive/bin/metatool -listFSRoot

Upgrading the Hive Metastore to Use HDFS HA

For CDH 4.1 and later, the Hive Metastore can be configured to use HDFS High Availability.. See Hive Installation.
To configure the Hive metastore to use HDFS HA, change the records to reflect the location specified in the dfs.nameservices property, using the Hive metatool to obtain and change the locations.
  Note: Before attempting to upgrade the Hive metastore to use HDFS HA, shut down the metastore and back it up to a persistent store.
If you are unsure which version of Avro SerDe is used, use both the serdePropKey and tablePropKey arguments. For example:
$ metatool -listFSRoot  
hdfs://oldnamenode.com/user/hive/warehouse  
$ metatool -updateLocation hdfs://nameservice1 hdfs://oldnamenode.com -tablePropKey avro.schema.url 
-serdePropKey schema.url  
$ metatool -listFSRoot 
hdfs://nameservice1/user/hive/warehouse
where:
  • hdfs://oldnamenode.com/user/hive/warehouse identifies the NameNode location.
  • hdfs://nameservice1 specifies the new location and should match the value of the dfs.nameservices property.
  • tablePropKey is a table property key whose value field may reference the HDFS NameNode location and hence may require an update. To update the Avro SerDe schema URL, specify avro.schema.url for this argument.
  • serdePropKey is a SerDe property key whose value field may reference the HDFS NameNode location and hence may require an update. To update the Haivvero schema URL, specify schema.url for this argument.
  Note: The Hive MetaTool is a best effort service that tries to update as many Hive metastore records as possible. If it encounters an error during the update of a record, it skips to the next record.

No comments:

Post a Comment