【问题标题】:error while running sbt assembly : sbt deduplication error运行 sbt 程序集时出错:sbt 重复数据删除错误
【发布时间】:2014-11-02 20:11:03
【问题描述】:

我正面临下面帖子中描述的确切问题,建议的答案没有帮助。 sbt-assembly: deduplication found error

[error] (*:assembly) deduplicate: different file contents found in the following:
[error] C:\Users\xxx\.ivy2\cache\org.eclipse.jetty.orbit\javax.transaction\orbits\javax.transaction-1.1.1.v201105210645.jar:META-INF/ECLIPSEF.RSA
[error] C:\Users\xxx\.ivy2\cache\org.eclipse.jetty.orbit\javax.servlet\orbits\javax.servlet-3.0.0.v201112011016.jar:META-INF/ECLIPSEF.RSA
[error] C:\Users\xxx\.ivy2\cache\org.eclipse.jetty.orbit\javax.mail.glassfish\orbits\javax.mail.glassfish-1.4.1.v201005082020.jar:META-INF/ECLIPSEF.RSA
[error] C:\Users\xxx\.ivy2\cache\org.eclipse.jetty.orbit\javax.activation\orbits\javax.activation-1.1.0.v201105071233.jar:META-INF/ECLIPSEF.RSA
[error] Total time: 14 s, completed Sep 9, 2014 5:21:01 PM

我的 build.sbt 文件包含

name := "Simple"

version := "0.1.0"

scalaVersion := "2.10.4"

libraryDependencies ++= Seq(
  "org.twitter4j" % "twitter4j-stream" % "3.0.3"
)

//libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.2"

libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.0.2"

libraryDependencies += "org.apache.spark" %% "spark-streaming-twitter" % "1.0.2"

libraryDependencies += "com.github.nscala-time" %% "nscala-time" % "0.4.2"

libraryDependencies ++= Seq(
    ("org.apache.spark"%%"spark-core"%"1.0.2").
    exclude("org.eclipse.jetty.orbit", "javax.servlet").
    exclude("org.eclipse.jetty.orbit", "javax.transaction").
    exclude("org.eclipse.jetty.orbit", "javax.mail").
    exclude("org.eclipse.jetty.orbit", "javax.activation").
    exclude("commons-beanutils", "commons-beanutils-core").
    exclude("commons-collections", "commons-collections").
    exclude("commons-collections", "commons-collections").
    exclude("com.esotericsoftware.minlog", "minlog")
)

resolvers += "Akka Repository" at "http://repo.akka.io/releases/"

    mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
    {
        case PathList("javax", "servlet", xs @ _*)         => MergeStrategy.first
        case PathList("javax", "transaction", xs @ _*)     => MergeStrategy.first
        case PathList("javax", "mail", xs @ _*)     => MergeStrategy.first
        case PathList("javax", "activation", xs @ _*)     => MergeStrategy.first
        case PathList(ps @ _*) if ps.last endsWith ".html" => MergeStrategy.first
        case "application.conf" => MergeStrategy.concat
        case "unwanted.txt"     => MergeStrategy.discard
        case x => old(x)
        }
    }

关于如何解决上述问题的任何指示?

【问题讨论】:

    标签: scala sbt apache-spark sbt-assembly


    【解决方案1】:

    如果您计划从 Spark 运行您的程序,那么我强烈建议您将所有 Spark 依赖项添加为 provided,以便将它们排除在组装任务之外。

    libraryDependencies ++= Seq(
      "org.apache.spark" %% "spark-core"              % "1.0.2" % "provided",
      "org.apache.spark" %% "spark-streaming"         % "1.0.2" % "provided",
      "org.apache.spark" %% "spark-streaming-twitter" % "1.0.2" % "provided")
    

    在另一种情况下,您需要从类路径中删除那些 jars 或向 mergeStrategy 添加适当的行,在您的情况下是

    case PathList("META-INF", "ECLIPSEF.RSA") => MergeStrategy.first
    

    如果你仍然希望处理 Spark 的依赖地狱,sbt-dependency-graph 插件应该会有所帮助。另请注意,其他 Spark 依赖项,如 spark-streamingspark-streaming-twitter 可能也需要 exclude 指令。

    【讨论】:

    • 能否请您详细说明一下“如果您计划从 Spark 运行您的程序,那么我强烈建议您添加所有提供的 Spark 依赖项。”如何添加它们?
    • @Siva 这意味着当 Spark 运行时,这些 jar 在您部署作业时已经可用,无需将它们与应用程序一起发布。请参阅我的更新答案。
    • 当我尝试添加上述合并策略时,又出现了一个错误,[error] C:\Users\xxx\.ivy2\cache\com.esotericsoftware.kryo\kryo\bundles \kryo-2.21.jar:com/esotericsoftware/minlog/Log$Logger.class 错误] C:\Users\xxx\.ivy2\cache\com.esotericsoftware.minlog\minlog\jars\minlog-1.2.jar:com/ esotericsoftware/minlog/Log$Logger.class
    • 这样的问题是它违背了组装的全部意义......那就是构建一个不需要担心类路径的胖罐。
    【解决方案2】:

    因此,为了消除烦人的“重复数据删除”消息,我并没有理会排除的内容,它似乎对我没有帮助。我从 sbt 代码中复制并粘贴了defaultMergeStrategy,只是将deduplicate 所在的行更改为first。我还必须在最后添加一个全部内容以坚持first。老实说,我不知道这意味着什么,也不知道为什么它会让烦人的消息消失......我没有时间在 sbt 中获得博士学位,我希望我的代码只是构建!所以合并策略变为:

    mergeStrategy in assembly <<= (mergeStrategy in assembly) ((old) => {
      case x if Assembly.isConfigFile(x) =>
        MergeStrategy.concat
      case PathList(ps @ _*) if Assembly.isReadme(ps.last) || Assembly.isLicenseFile(ps.last) =>
        MergeStrategy.rename
      case PathList("META-INF", xs @ _*) =>
        (xs map {_.toLowerCase}) match {
          case ("manifest.mf" :: Nil) | ("index.list" :: Nil) | ("dependencies" :: Nil) =>
            MergeStrategy.discard
          case ps @ (x :: xs) if ps.last.endsWith(".sf") || ps.last.endsWith(".dsa") =>
            MergeStrategy.discard
          case "plexus" :: xs =>
            MergeStrategy.discard
          case "services" :: xs =>
            MergeStrategy.filterDistinctLines
          case ("spring.schemas" :: Nil) | ("spring.handlers" :: Nil) =>
            MergeStrategy.filterDistinctLines
          case _ => MergeStrategy.first // Changed deduplicate to first
        }
      case PathList(_*) => MergeStrategy.first // added this line
    })
    

    【讨论】:

    • 这行得通吗?我试了一下,胖罐子小得令人怀疑。
    猜你喜欢
    • 2015-01-27
    • 2013-12-21
    • 2017-06-16
    • 2015-03-09
    • 1970-01-01
    • 2014-09-28
    • 2019-06-03
    • 2014-11-13
    • 2016-09-02
    相关资源
    最近更新 更多