To read/write avro file in Spark.
libraryDependenicies += "com.databricks" %% "spark-avro" % "2.0.1"
import com.databricks.spark.avro._ val df = sqlContext.read.avro("src/test/resources/episodes.avro") df.filter("doctor > 5").write.avro("/tmp/output")
Issue: ClassNotFoundException: org.apache.avro.mapreduce.AvroJob
Solution: Need to add avro-mapred lib
Issue: IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
Avro 1.7.x provides a new mapreduce API. However, because the avro-mapred JAR is compiled against Hadoop 0.20.x, there are incompatibilities that prevent it from being used in MapReduce jobs
Solution: Add classifier to hadoop2
"org.apache.avro" % "avro-mapred" % 1.7.7 classifier "hadoop2"
Reference:
http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-2-0/CDH4-Release-Notes/cdh4ki_topic_2_9.html
No comments:
Post a Comment