$wget [http://public-repo-1.hortonworks.com/HDP-LABS/Projects/spark/1.2.0/spark-1.2.0.2.2.0.0-82-bin-2.6.0.2.2.0.0-2041.tgz]
2. Copy the downloaded Spark tarball to your Hadoop cluster.
2. Copy the downloaded Spark tarball to your Hadoop cluster.
$scp spark-1.2.0.2.2.0.0-82-bin-2.6.0.2.2.0.0-2041.tgz root@127.0.0.1:/root
3. Set up the environment
1. Set environment variable: export YARN_CONF_DIR=/etc/hadoop/conf
2. Create a file SPARK_HOME/conf/spark-defaults.conf and add the following settings:
Copy data
3. Set up the environment
1. Set environment variable: export YARN_CONF_DIR=/etc/hadoop/conf
2. Create a file SPARK_HOME/conf/spark-defaults.conf and add the following settings:
- spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0–2041
- spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0–2041
[root@sandbox spark-1.2.0.2.2.0.0-82-bin-2.6.0.2.2.0.0-2041]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 lib/spark-examples*.jar 10
5. Run Spark WordCount
Copy data
$hadoop fs -copyFromLocal /etc/hadoop/conf/log4j.properties /tmp/data
$./bin/spark-shell --master yarn-client --driver-memory 512m --executor-memory 512m
At the Scala REPL,
val file = sc.textFile("hdfs://hdfsip:8020/tmp/data") val counts = file.flatMap(line=>line.split(" ")).map(word=>(word,1)).reduceByKey(_+_) counts.saveAsTextFile("hdfs://hdfsip:8020/tmp/wordcount") counts.toArray().foreach(println)
6. Use Spark Job History Server
1. Add History Services to SPARK_HOME/conf/spark-defaults.conf
spark.yarn.services org.apache.spark.deploy.yarn.history.YarnHistoryService
spark.history.provider org.apache.spark.deploy.yarn.history.YarnHistoryProvider
spark.yarn.historyServer.address localhost:18080
2. Start/Stop the Spark History Server
$./sbin/start(stop)-history-server.sh
7. Install Gfortan for MLlib
$ sudo yum install gcc-gfortran
Otherwise,
java.lang.UnsatisfiedLinkError:
org.jblas.NativeBlas.dposv(CII[DII[DII)I
at org.jblas.NativeBlas.dposv(Native Method)
at org.jblas.SimpleBlas.posv(SimpleBlas.java:369)
at org.jblas.Solve.solvePositive(Solve.java:68)
Reference:
http://hortonworks.com/hadoop-tutorial/using-apache-spark-hdp/
http://spark.apache.org/docs/latest/running-on-yarn.html
how to run Spark on secure HDP cluster?
ReplyDelete