Sunday 28 February 2016

Set Log4j Conf in Spark Job

Normally, spark job gives us a lot of logs in console, which hide our prints.
Put below log4j.properties under resources folder of your project.

log4j.rootCategory=WARN, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=WARN
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=WARN

To use a custom log4j configuration for the application master or executors, here are the options:

add -Dlog4j.configuration=<location of configuration file> to spark.driver.extraJavaOptions (for the driver) or spark.executor.extraJavaOptions (for executors). Note that if using a file, the file: protocol should be explicitly provided, and the file needs to exist locally on all the nodes.
Note this step should be in front of  --class

$SPARK_HOME/bin/spark-submit --jars $JARS \
--files ${HOME}/conf/log4j.properties \
--driver-java-options "-Dlog4j.configuration=file://${HOME} /conf/log4j.properties"  \
--class $APP_CLASS $APP_JAR \
--master $SPARK_MASTER \
$@


Reference:

No comments:

Post a Comment