【发布时间】:2018-07-03 13:05:56
【问题描述】:
这个奇怪的异常正在终止我的 spark 任务,有什么想法吗?
我正在通过 sc.parallelize(... seq of 256 items ...) “提交”许多较小的任务来激发上下文。 (不要问我为什么,但这是我需要的)。
Exception in thread "main" java.util.zip.ZipException: invalid LOC header (bad signature)
at java.util.zip.ZipFile.read(Native Method)
at java.util.zip.ZipFile.access$1400(ZipFile.java:56)
at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679)
at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at org.apache.spark.util.Utils$.copyStream(Utils.scala:347)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$getClassReader(ClosureCleaner.scala:40)
at org.apache.spark.util.ClosureCleaner$.getInnerClasses(ClosureCleaner.scala:84)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:107)
at org.apache.spark.SparkContext.clean(SparkContext.scala:1623)
at org.apache.spark.rdd.RDD.flatMap(RDD.scala:295)
at com.stratified.pdfingestion.CermineJob$.extractPdfText(CermineJob.scala:53)
at com.stratified.pdfingestion.CermineJob$.execute(CermineJob.scala:41)
at com.stratified.pdfingestion.CermineJob$$anonfun$main$1.apply(CermineJob.scala:31)
at com.stratified.pdfingestion.CermineJob$$anonfun$main$1.apply(CermineJob.scala:29)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at com.stratified.pdfingestion.CermineJob$.main(CermineJob.scala:29)
at com.stratified.pdfingestion.CermineJob.main(CermineJob.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
【问题讨论】:
-
您找到解决方案了吗?我在另一个上下文中遇到了这个问题,到目前为止,我的猜测是 zip-Archive 是使用比库能够说的更新的规范创建的。
-
不幸的是我不记得这个问题或者我是否找到了解决方案
-
好的,我正在努力寻找这个问题的答案,以防万一出现这种情况或有人偶然发现这篇文章有同样的问题:stackoverflow.com/questions/33480085/…
-
有什么更新吗?我似乎在我的 spark 流应用程序中得到了这个,不确定它来自...
标签: apache-spark