Sqoop by default uses 4 concurrent map tasks to transfer data to Hadoop.
You can increase it after talking to your DBA. Don’t make it too much keep it around 10 to not to overwhelm the database.
In most cases, you will get the specified number of mappers, but it's not guaranteed. If your data set is very small, swoop might resort to using a smaller number of mappers.
The optimal number of mappers depends on many variables: you need to take into account your database type, the hardware that is used for your database server, and the impact to other requests that your database needs to serve. There is no optimal number of mappers that works for all scenarios. Instead, you’re encouraged to experiment to find the optimal degree of parallelism for your environment and use case. It’s a good idea to start with a small number of mappers, slowly ramping up, rather than to start with a large number of mappers, working your way down.
----'Apache Sqoop Cookbook'
Nice post ! Thanks for sharing valuable information with us. Keep sharing.. Big Data Hadoop Online Course Hyderabad
ReplyDelete