Monday 18 August 2014

Sqoop Metastore


Running sqoop-metastore launches a shared HSQLDB database instance on the current machine. Clients can connect to this metastore and create jobs which can be shared between users for execution.

In conf/sqoop-site.xml, you can configure sqoop.metastore.client.autoconnect.url with this address, so you do not have to supply --meta-connect to use a remote metastore. For example, jdbc:hsqldb:hsql://metaserver.example.com:16000/sqoop

If you configure sqoop.metastore.client.enable.autoconnect with the value true, then you don't have to explicitly supply --meta-connect.

Note that you have to set sqoop.metastore.client.record.password to true if you are executing saved jobs via Oozie because Sqoop cannot prompt the user to enter passwords while being executed as Oozie tasks.

The Sqoop metastore works only with HSQLDB
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Requirements-and-Supported-Versions/cdhrsv_db.html

HSQLDB can be run in 2 mode: In-process mode and in-memory mode. Sqoop used HSQLDB in a way that it persists its information and DB to a disk.
The location of the metastore’s files on disk is controlled by the sqoop.metastore.server.locationproperty in conf/sqoop-site.xml. This should point to a directory on the local filesystem.

In CDH, the default location for sqoop metastore is 
/tmp/sqoop-metastore/shared.db


Reference:

http://archive.cloudera.com/cdh5/cdh/5/sqoop/SqoopUserGuide.html#_literal_sqoop_metastore_literal
http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_metastore_literal

1 comment:

  1. it is very excellent blog and useful article thank you for sharing with us , keep posting learn more Big Data Hadoop Online Training India

    ReplyDelete