Running Apache Spark on YARN

Spark Logging
Spark Configuration File
Submitting applications to Spark
YARN needs HADOOP environment variable.
For PreLoading SPARK Runtime JAR, add to HDFS.
hdfs dfs -copyFromLocal lib/spark-assembly-1.1.0-hadoop2.3.0.jar /yarn
hdfs dfs -ls /user/gpadmin
export YARN_CONF_DIR=/my dir
export SPARK_HOME=/usr/share/spark
export HADOOP_CONF_DIR=/etc/gphd/hadoop/conf
hdfs dfs -mkdir -p /user/spark/share/lib
hdfs dfs -put $SPARK_HOME/assembly/lib/spark-assembly_*.jar
/user/spark/share/lib/spark-assembly.jar
export SPARK_JAR=hdfs://pivhdsne.localdomain:8020/user/spark/share/lib/spark-assembly.jar
/usr/share/spark/bin/spark-submit  --num-executors 10  --master yarn-cluster
  --class org.apache.spark.examples.SparkPi
  /usr/share/spark/jars/spark-examples-1.1.0-hadoop2.2.0-gphd-3.0.1.0.jar 10
/usr/share/spark/bin/spark-shell --master yarn-client 
--spark.yarn.jar hdfs://pivhdsne.localdomain:8020/ --verbose
Useful Links
Checking Logs for a YARN App (such as a SPARK job)
yarn logs -applicationId application_1418749874519_0001

 

Leave a Reply