Monday, 23 November 2015

Spark MLlib Causes duplicate dependency issue in sbt assembly

After adding spark-mllib in dependency,

object Spark {
  val sparkCore = "org.apache.spark" %% "spark-core" % SparkVersion %  "provided"
  val sparkML = "org.apache.spark" %% "spark-mllib" % SparkVersion commonsLoggingExclude
  val sparkSql = "org.apache.spark" %% "spark-sql" % SparkVersion commonsLoggingExclude
  val sparkHive = List("org.apache.spark" %% "spark-hive" % SparkVersion exclude("com.twitter", "parquet-hadoop-bundle") commonsLoggingExclude,
 // Bump up dependency versions to avoid conflicts
 "com.esotericsoftware.kryo" % "kryo" % "2.24.0",
 "commons-configuration" % "commons-configuration" % "1.10" commonsLoggingExclude)
}


When I ran "$ sbt clean assembly", it returned below errors:

[error] (*:assembly) deduplicate: different file contents found in the following:
[error] /Users/cjin/.ivy2/cache/stax/stax-api/jars/stax-api-1.0.1.jar:javax/xml/XMLConstants.class
[error] /Users/cjin/.ivy2/cache/javax.xml.bind/jsr173_api/jars/jsr173_api-1.0.jar:javax/xml/XMLConstants.class

Or
[error] (*:assembly) deduplicate: different file contents found in the following:
[error] /Users/cjin/.ivy2/cache/com.esotericsoftware.kryo/kryo/bundles/kryo-2.21.jar:com/esotericsoftware/minlog/Log$Logger.class

To fix the issue, add below in build.sbt:

import MergeStrategy._

val excludedFiles = Seq("pom.xml", "pom.properties", "manifest.mf", "package-info.class")

assemblyMergeStrategy in assembly := {
  case PathList("javax", "xml", xs @ _*) => MergeStrategy.first
  case f if excludedFiles.exists(f.endsWith(_)) => discard
  case "org/apache/spark/unused/UnusedStubClass.class" | "plugin.xml" | "META-INF/aop.xml" => first
  case f if f.startsWith("com/google/common/base/") => first
  case f =>
    val oldStrategy = (assemblyMergeStrategy in assembly).value
    oldStrategy(f)
}

//Disable test in assembly phase, should run "sbt test-only" before assembly in CI.
test in assembly := {}


Reference:
http://stackoverflow.com/questions/17089047/sbt-assembly-and-multiple-class-defs-in-dependencies
https://github.com/sbt/sbt-assembly
http://stackoverflow.com/questions/25744050/error-while-running-sbt-assembly-sbt-deduplication-error

No comments:

Post a Comment