Friday, 15 January 2016

Spark loads data from Oracle/DB2/SQLServer

Oracle:

url="jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=host)(PORT=0000))
(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=servicename)))"
user=""
password=""
driver="oracle.jdbc.driver.OracleDriver"


DB2:

url="jdbc:db2://host:port/dbname"
user=""
password=""
driver="com.ibm.db2.jcc.DB2Driver"

SQLServer:

 url = "jdbc:sqlserver://host:port"
 driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
 user = ""
 password = ""
 dbname = ""

    val prop = new java.util.Properties
    prop.setProperty("user", username)
    prop.setProperty("password", password)
    val tableDf = sqlContext.read.jdbc(url, "schema.tableB", prop)
  
sqlContext.read.jdbc expects a table name, it does not accept a select statement.
The select statement can be turned into a subquery with alias name
For example, "(select * from schema.tableBW) as tbl"
This works for SqlServer and DB2, but not Oracle.

Reference:
http://www.oracledistilled.com/java/jdbc/connect-to-an-oracle-database-using-java-and-jdbc/
http://www.sparkexpert.com/2015/04/17/save-apache-spark-dataframe-to-database/
http://www.sparkexpert.com/2015/03/28/loading-database-data-into-spark-using-data-sources-api/

No comments:

Post a Comment