Be careful to not accidentally close over some objects instantiated from your driver's program, like the log object below.
import org.log4j.Logger
dstream.foreachRDD { rdd=>
rdd.foreachPartition { records =>
val logger = Logger.getLogger(getClass)
logger.info("message")
}}
You will see below error:
Exception in thread "main" org.apache.spark.SparkException: Task not serializable -> Caused by: java.io.NotSerializableException: org.apache.log4j.Logger
The usual solution to this type of problems is to instantiate the objects you want to use within your map functions. You can define a factory object that you can create your log object from.
object Holder extends Serializable {
@transient lazy val log = Logger.getLogger(getClass.getName)
}
val someRdd = spark.parallelize(List(1, 2, 3)).foreach { element =>
Holder.log.info(element)
}
Or using slf4j logFactory
import org.slf4j.LoggerFactory
dstream.foreachRDD { rdd=>
rdd.foreachPartition { records =>
val logger = LoggerFactory.getLogger(getClass)
logger.info("message")
}}
Reference:
http://stackoverflow.com/questions/29208844/apache-spark-logging-within-scala
Really appreciated the information and please keep sharing, I would like to share some information regarding online training.Maxmunus Solutions is providing the best quality of this Apache Spark and Scala programming language. and the training will be online and very convenient for the learner.This course gives you the knowledge you need to achieve success.
ReplyDeleteFor Joining online training batches please feel free to call or email us.
Email : minati@maxmunus.com
Contact No.-+91-9066638196/91-9738075708
website:-www.maxmunus.com