Monday, 18 August 2014

Sqoop Saved Job with Incremental Imports

1. Saved jobs remember the parameters used to specify a job, so they can be re-executed by invoking the job by its handle.


Creating saved jobs is done with the --create action. This operation requires a -- followed by a tool name and its arguments. The tool and its arguments will form the basis of the saved job. Consider:
$ sqoop job --create myjob -- import --connect jdbc:mysql://example.com/db \
    --table mytable

The exec action allows you to override arguments of the saved job by supplying them after a --. For example, if the database were changed to require a username, we could specify the username and password with:
$ sqoop job --exec myjob -- --username someuser -P
Enter password:
...


2. Incremental imports are performed by comparing the values in a check column against a reference value for the most recent import. 

If an incremental import is run from the command line, the value which should be specified as --last-value in a subsequent incremental import will be printed to the screen for your reference. 

If an incremental import is run from a saved job, this value will be retained in the saved job. Subsequent runs of sqoop job --exec someIncrementalJob will continue to import only newer rows than those previously imported.


No comments:

Post a Comment