Tuesday 8 July 2014

$CONDITIONS in Sqoop

Its worth nothing the importance of $CONDITIONS in free form query ( this thread explains well , getting info from there)
If you run a parallel import, the map tasks will execute your query with different values substituted in for $CONDITIONS. e.g., one mapper may execute "select bla from foo WHERE (id >=0 AND id < 10000)", and the next mapper may execute "select bla from foo WHERE (id >= 10000 AND id < 20000)" and so on.
Sqoop does not parse your SQL statement into an abstract syntax tree which would allow it to modify your query without textual hints. You are free to add further constraints like you suggested in your initial example (read the thread), but the literal string "$CONDITIONS" does need to appear in the WHERE clause of your query so that Sqoop can textually replace it with its own refined constraints.
Setting -m 1 is the only way to force a non-parallel import. You still need $CONDITIONS in there because it queries the database
about column type information, etc in the client before executing the import job, but does not want actual rows returned to the client. So
it will execute your query with $CONDITIONS set to '1 = 0' to ensure that it receives type information, but not records.
Notes from Sqoop documentation
If you are issuing the query wrapped with double quotes ("), you will have to use \$CONDITIONS instead of just $CONDITIONS to disallow your shell from treating it as a shell variable. For example, a double quoted query may look like: "SELECT * FROM x WHERE a='foo' AND \$CONDITIONS"
The facility of using free-form query in the current version of Sqoop is limited to simple queries where there are no ambiguous projections and no OR conditions in the WHERE clause. Use of complex queries such as queries that have sub-queries or joins leading to ambiguous projections can lead to unexpected results.

Notes for Oozie action:  Use $CONDITIONS without '\'

No comments:

Post a Comment