Sunday, 21 December 2014

Install and Configure Spark in CDH4.7

I tried to install Spark in CDH4.7 VM through parcel.
However, after activation the cluster can't be started. So I install Spark individually.


1. Configure conf/spark-defaults.conf

spark.eventLog.enabled           true
spark.serializer                 org.apache.spark.serializer.KryoSerializer
spark.driver.memory              2g
spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
spark.executor.memory          6g

2.  EOF Exception when trying to access hdfs
java.io.IOException: Call to localhost/10.85.85.17:9000 failed on local
> exception: java.io.EOFException

This means the hdfs url is not set properly.

val dataRDD = sc.textFile("hdfs://localhost.localdomain:8020/user/cloudera/data.txt")

3. org.apache.hadoop.ipc.remoteexception: server ipc version 7 cannot communicate with client version 4

This means your server has slighly newer version, than your client. 
Should choose to download spark pre-build for ch4, rather than pre-build for hadoop1.x

4. in CDH5.2 VM, Permission denied: user=cloudera, access=EXECUTE, inode="/user/spark":spark:spark:drwxr-x---
sudo -u hdfs hadoop fs -chmod -R 777 /user/spark

Reference:
http://stackoverflow.com/questions/23634985/error-when-trying-to-write-to-hdfs-server-ipc-version-9-cannot-communicate-with
https://gist.github.com/berngp/10793284

No comments:

Post a Comment