Monday 8 September 2014

Avoid Installing Python Modules in Map-Reduce by ZipImport

When working with Hadoop streaming in python, we want to avoid installing Python modules in each machine of the cluster.
The reasons are two.


  1. Normally, we have no permission to install modules in cluster
  2. We don't want install each node in the cluster.

This can be achieved by ZipImport

1. Prepare pysftp library.
    Download pysftp-0.2.8.tar.gz,  tar -xzvf pysftp-0.2.8.tar.gz
    zip -r pysftp.mod pysftp.py
    copy pysftp to app/lib

2.
import zipimport
importer = zipimport.zipimporter('lib/pysftp.mod')
pysftp = importer.load_module('pysftp')
sftp = pysftp.Connection(ftp_host, username=ftp_username, password=ftp_password)


Reference:

https://docs.python.org/2/library/zipimport.html
http://atbrox.com/2009/11/11/how-to-combine-elastic-mapreducehadoop-with-other-amazon-web-services/

No comments:

Post a Comment