【问题标题】:Java heap space OutOfMemoryError in pyspark spark-submit?pyspark spark-submit中的Java堆空间OutOfMemoryError?
【发布时间】:2018-06-09 17:16:53
【问题描述】:

我的数据集大小为 10GB(例如 Test.txt)。

我编写了如下所示的 pyspark 脚本(Test.py):

from pyspark import SparkConf
from pyspark.sql import SparkSession
from pyspark.sql import SQLContext
spark = SparkSession.builder.appName("FilterProduct").getOrCreate()
sc = spark.sparkContext
sqlContext = SQLContext(sc)
lines = spark.read.text("C:/Users/test/Desktop/Test.txt").rdd
lines.collect()

然后我使用以下命令执行上述脚本:

spark-submit Test.py --executor-memory  12G 

然后我收到如下错误:

17/12/29 13:27:18 INFO FileScanRDD: Reading File path: file:///C:/Users/test/Desktop/Test.txt, range: 402653184-536870912, partition values: [empty row]
17/12/29 13:27:18 INFO CodeGenerator: Code generated in 22.743725 ms
17/12/29 13:27:44 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3230)
        at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
        at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
        at org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
        at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1877)
        at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1786)
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1189)
        at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
        at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
        at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:383)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
17/12/29 13:27:44 ERROR Executor: Exception in task 2.0 in stage 0.0 (TID 2)
java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3230)
        at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
        at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93

请告诉我如何解决这个问题?

【问题讨论】:

  • 当你使用collect时,整个文件被拉到驱动节点上以获得结果,所以如果你将执行器内存设置为50GB也没关系。删除它,你的代码就可以工作了。
  • 我删除了仍然遇到同样的错误。
  • 您是否尝试通过指定分区来读取文件;像这样的东西..sc.textFile(file, numPartitions)
  • 是的,我试过了,但我仍然遇到同样的错误。
  • 我也尝试了 200Mb 文件。我遇到了同样的错误。请帮助我解决这个问题。我在 Windows10、16GB Ram、I3 处理器上使用 pyspark。我也尝试过增加驱动程序和执行程序的内存。我仍然面临同样的问题。

标签: apache-spark pyspark


【解决方案1】:

你可以试试--conf "spark.driver.maxResultSize=20g"。您应该检查 spark conf page.spark.apache.org/docs/latest/configuration.html 上的配置。

除了这个答案,我想建议你减少你的任务结果,否则你可能会遇到序列化问题。

【讨论】:

    【解决方案2】:

    在您的apache-spark 目录中检查您是否有文件apache-spark/2.4.0/libexec/conf/spark-defaults.conf,其中2.4.0 对应于apache-spark 版本。

    如果此文件不存在,则创建它。

    然后在文件末尾插入:spark.driver.memory 12g

    这应该不需要--executor-memory 12G就可以解决:只需执行spark-submit Test.py

    【讨论】:

      【解决方案3】:

      您是否在 spark-submit 时检查了 jvm 的最大堆大小值。如果您看到传递给 spark-submit 的值,则意味着您可以正确设置最大堆大小。

      例如,如果您的设置为 4G,则为 ./spark-submit --driver-memory 4G Test123.py 您应该在 jvisualvm 屏幕上看到-Xmx4G,如下所示。

      即使您可以正确设置最大堆大小,您也可能会看到新错误 results of 7 tasks (1158.5 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)

      堆栈跟踪在这里

      C:\Users\test\Desktop>spark-submit Test123.py -v --executor-memory  4G --driver-memory 10G  --conf spark.driver.maxResultSize=2g
      Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
      18/01/12 11:00:58 INFO SparkContext: Running Spark version 2.2.0
      18/01/12 11:00:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
      18/01/12 11:00:58 INFO SparkContext: Submitted application: FilterProduct
      18/01/12 11:00:58 INFO SecurityManager: Changing view acls to: test
      18/01/12 11:00:58 INFO SecurityManager: Changing modify acls to: test
      18/01/12 11:00:58 INFO SecurityManager: Changing view acls groups to:
      18/01/12 11:00:58 INFO SecurityManager: Changing modify acls groups to:
      18/01/12 11:00:58 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(test); groups with view permissions: Set(); users  with modify permissions: Set(test); groups with modify permissions: Set()
      18/01/12 11:00:59 INFO Utils: Successfully started service 'sparkDriver' on port 63460.
      18/01/12 11:00:59 INFO SparkEnv: Registering MapOutputTracker
      18/01/12 11:00:59 INFO SparkEnv: Registering BlockManagerMaster
      18/01/12 11:00:59 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
      18/01/12 11:00:59 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
      18/01/12 11:00:59 INFO DiskBlockManager: Created local directory at C:\Users\test\AppData\Local\Temp\blockmgr-49e8e874-1361-4fe3-a8f5-02beca717299
      18/01/12 11:00:59 INFO MemoryStore: MemoryStore started with capacity 4.1 GB
      18/01/12 11:00:59 INFO SparkEnv: Registering OutputCommitCoordinator
      18/01/12 11:01:00 INFO Utils: Successfully started service 'SparkUI' on port 4040.
      18/01/12 11:01:00 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.179.1:4040
      18/01/12 11:01:00 INFO SparkContext: Added file file:/C:/Users/test/Desktop/Test123.py at file:/C:/Users/test/Desktop/Test123.py with timestamp 1515735060704
      18/01/12 11:01:00 INFO Utils: Copying C:\Users\test\Desktop\Test123.py to C:\Users\test\AppData\Local\Temp\spark-0c5cf93c-e443-4ab1-b2ea-58e1b91fa310\userFiles-c067f30d-21b5-4f60-ab0d-7df3081b1d2a\Test123.py
      18/01/12 11:01:01 INFO Executor: Starting executor ID driver on host localhost
      18/01/12 11:01:01 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 63481.
      18/01/12 11:01:01 INFO NettyBlockTransferService: Server created on 192.168.179.1:63481
      18/01/12 11:01:01 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
      18/01/12 11:01:01 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.179.1, 63481, None)
      18/01/12 11:01:01 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.179.1:63481 with 4.1 GB RAM, BlockManagerId(driver, 192.168.179.1, 63481, None)
      18/01/12 11:01:01 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.179.1, 63481, None)
      18/01/12 11:01:01 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.179.1, 63481, None)
      18/01/12 11:01:01 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/C:/Users/test/Desktop/spark-warehouse/').
      18/01/12 11:01:01 INFO SharedState: Warehouse path is 'file:/C:/Users/test/Desktop/spark-warehouse/'.
      18/01/12 11:01:02 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
      18/01/12 11:01:06 INFO FileSourceStrategy: Pruning directories with:
      18/01/12 11:01:06 INFO FileSourceStrategy: Post-Scan Filters:
      18/01/12 11:01:06 INFO FileSourceStrategy: Output Data Schema: struct<value: string>
      18/01/12 11:01:06 INFO FileSourceScanExec: Pushed Filters:
      18/01/12 11:01:07 INFO CodeGenerator: Code generated in 385.116819 ms
      18/01/12 11:01:07 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 277.7 KB, free 4.1 GB)
      18/01/12 11:01:08 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.4 KB, free 4.1 GB)
      18/01/12 11:01:08 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.179.1:63481 (size: 23.4 KB, free: 4.1 GB)
      18/01/12 11:01:08 INFO SparkContext: Created broadcast 0 from javaToPython at NativeMethodAccessorImpl.java:0
      18/01/12 11:01:08 INFO FileSourceScanExec: Planning scan with bin packing, max size: 134217728 bytes, open cost is considered as scanning 4194304 bytes.
      18/01/12 11:01:08 INFO SparkContext: Starting job: collect at C:/Users/test/Desktop/Test123.py:15
      18/01/12 11:01:09 INFO DAGScheduler: Got job 0 (collect at C:/Users/test/Desktop/Test123.py:15) with 72 output partitions
      18/01/12 11:01:09 INFO DAGScheduler: Final stage: ResultStage 0 (collect at C:/Users/test/Desktop/Test123.py:15)
      18/01/12 11:01:09 INFO DAGScheduler: Parents of final stage: List()
      18/01/12 11:01:09 INFO DAGScheduler: Missing parents: List()
      18/01/12 11:01:09 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[3] at javaToPython at NativeMethodAccessorImpl.java:0), which has no missing parents
      18/01/12 11:01:09 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 7.2 KB, free 4.1 GB)
      18/01/12 11:01:09 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.9 KB, free 4.1 GB)
      18/01/12 11:01:09 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.179.1:63481 (size: 3.9 KB, free: 4.1 GB)
      18/01/12 11:01:09 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
      18/01/12 11:01:09 INFO DAGScheduler: Submitting 72 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at javaToPython at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14))
      18/01/12 11:01:09 INFO TaskSchedulerImpl: Adding task set 0.0 with 72 tasks
      18/01/12 11:01:09 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 5290 bytes)
      18/01/12 11:01:09 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 5290 bytes)
      18/01/12 11:01:09 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, executor driver, partition 2, PROCESS_LOCAL, 5290 bytes)
      18/01/12 11:01:09 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, executor driver, partition 3, PROCESS_LOCAL, 5290 bytes)
      18/01/12 11:01:09 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
      18/01/12 11:01:09 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
      18/01/12 11:01:09 INFO Executor: Running task 3.0 in stage 0.0 (TID 3)
      18/01/12 11:01:09 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
      18/01/12 11:01:09 INFO Executor: Fetching file:/C:/Users/test/Desktop/Test123.py with timestamp 1515735060704
      18/01/12 11:01:09 INFO Utils: C:\Users\test\Desktop\Test123.py has been previously copied to C:\Users\test\AppData\Local\Temp\spark-0c5cf93c-e443-4ab1-b2ea-58e1b91fa310\userFiles-c067f30d-21b5-4f60-ab0d-7df3081b1d2a\Test123.py
      18/01/12 11:01:09 INFO FileScanRDD: Reading File path: file:///C:/Users/test/Desktop/copy/SaleLine/SaleLine, range: 134217728-268435456, partition values: [empty row]
      18/01/12 11:01:09 INFO FileScanRDD: Reading File path: file:///C:/Users/test/Desktop/copy/SaleLine/SaleLine, range: 402653184-536870912, partition values: [empty row]
      18/01/12 11:01:09 INFO FileScanRDD: Reading File path: file:///C:/Users/test/Desktop/copy/SaleLine/SaleLine, range: 0-134217728, partition values: [empty row]
      18/01/12 11:01:09 INFO FileScanRDD: Reading File path: file:///C:/Users/test/Desktop/copy/SaleLine/SaleLine, range: 268435456-402653184, partition values: [empty row]
      18/01/12 11:01:09 INFO CodeGenerator: Code generated in 19.618119 ms
      18/01/12 11:01:33 INFO MemoryStore: Block taskresult_1 stored as bytes in memory (estimated size 198.1 MB, free 3.9 GB)
      18/01/12 11:01:33 INFO BlockManagerInfo: Added taskresult_1 in memory on 192.168.179.1:63481 (size: 198.1 MB, free: 3.9 GB)
      18/01/12 11:01:33 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 207722505 bytes result sent via BlockManager)
      18/01/12 11:01:33 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, localhost, executor driver, partition 4, PROCESS_LOCAL, 5290 bytes)
      18/01/12 11:01:34 INFO MemoryStore: Block taskresult_0 stored as bytes in memory (estimated size 198.0 MB, free 3.7 GB)
      18/01/12 11:01:34 INFO Executor: Running task 4.0 in stage 0.0 (TID 4)
      18/01/12 11:01:34 INFO BlockManagerInfo: Added taskresult_0 in memory on 192.168.179.1:63481 (size: 198.0 MB, free: 3.7 GB)
      18/01/12 11:01:34 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 207614031 bytes result sent via BlockManager)
      18/01/12 11:01:34 INFO FileScanRDD: Reading File path: file:///C:/Users/test/Desktop/copy/SaleLine/SaleLine, range: 536870912-671088640, partition values: [empty row]
      18/01/12 11:01:34 INFO TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, localhost, executor driver, partition 5, PROCESS_LOCAL, 5290 bytes)
      18/01/12 11:01:34 INFO Executor: Running task 5.0 in stage 0.0 (TID 5)
      18/01/12 11:01:34 INFO FileScanRDD: Reading File path: file:///C:/Users/test/Desktop/copy/SaleLine/SaleLine, range: 671088640-805306368, partition values: [empty row]
      18/01/12 11:01:34 INFO MemoryStore: Block taskresult_2 stored as bytes in memory (estimated size 198.1 MB, free 3.5 GB)
      18/01/12 11:01:35 INFO BlockManagerInfo: Added taskresult_2 in memory on 192.168.179.1:63481 (size: 198.1 MB, free: 3.5 GB)
      18/01/12 11:01:35 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 207773776 bytes result sent via BlockManager)
      18/01/12 11:01:35 INFO TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, localhost, executor driver, partition 6, PROCESS_LOCAL, 5290 bytes)
      18/01/12 11:01:35 INFO Executor: Running task 6.0 in stage 0.0 (TID 6)
      18/01/12 11:01:35 INFO FileScanRDD: Reading File path: file:///C:/Users/test/Desktop/copy/SaleLine/SaleLine, range: 805306368-939524096, partition values: [empty row]
      18/01/12 11:01:35 INFO MemoryStore: Block taskresult_3 stored as bytes in memory (estimated size 198.0 MB, free 3.3 GB)
      18/01/12 11:01:35 INFO BlockManagerInfo: Added taskresult_3 in memory on 192.168.179.1:63481 (size: 198.0 MB, free: 3.3 GB)
      18/01/12 11:01:35 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3). 207626541 bytes result sent via BlockManager)
      18/01/12 11:01:35 INFO TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, localhost, executor driver, partition 7, PROCESS_LOCAL, 5290 bytes)
      18/01/12 11:01:35 INFO Executor: Running task 7.0 in stage 0.0 (TID 7)
      18/01/12 11:01:35 INFO FileScanRDD: Reading File path: file:///C:/Users/test/Desktop/copy/SaleLine/SaleLine, range: 939524096-1073741824, partition values: [empty row]
      18/01/12 11:01:35 INFO TransportClientFactory: Successfully created connection to /192.168.179.1:63481 after 345 ms (0 ms spent in bootstraps)
      18/01/12 11:01:40 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 30921 ms on localhost (executor driver) (1/72)
      18/01/12 11:01:40 INFO BlockManagerInfo: Removed taskresult_0 on 192.168.179.1:63481 in memory (size: 198.0 MB, free: 3.5 GB)
      18/01/12 11:01:42 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 33147 ms on localhost (executor driver) (2/72)
      18/01/12 11:01:42 INFO BlockManagerInfo: Removed taskresult_1 on 192.168.179.1:63481 in memory (size: 198.1 MB, free: 3.7 GB)
      18/01/12 11:01:44 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 34693 ms on localhost (executor driver) (3/72)
      18/01/12 11:01:44 INFO BlockManagerInfo: Removed taskresult_3 on 192.168.179.1:63481 in memory (size: 198.0 MB, free: 3.9 GB)
      18/01/12 11:01:46 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 36860 ms on localhost (executor driver) (4/72)
      18/01/12 11:01:46 INFO BlockManagerInfo: Removed taskresult_2 on 192.168.179.1:63481 in memory (size: 198.1 MB, free: 4.1 GB)
      18/01/12 11:01:55 INFO MemoryStore: Block taskresult_4 stored as bytes in memory (estimated size 198.1 MB, free 3.9 GB)
      18/01/12 11:01:55 INFO BlockManagerInfo: Added taskresult_4 in memory on 192.168.179.1:63481 (size: 198.1 MB, free: 3.9 GB)
      18/01/12 11:01:55 INFO Executor: Finished task 4.0 in stage 0.0 (TID 4). 207685862 bytes result sent via BlockManager)
      18/01/12 11:01:55 INFO TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8, localhost, executor driver, partition 8, PROCESS_LOCAL, 5290 bytes)
      18/01/12 11:01:55 INFO Executor: Running task 8.0 in stage 0.0 (TID 8)
      18/01/12 11:01:55 INFO FileScanRDD: Reading File path: file:///C:/Users/test/Desktop/copy/SaleLine/SaleLine, range: 1073741824-1207959552, partition values: [empty row]
      18/01/12 11:01:56 INFO MemoryStore: Block taskresult_5 stored as bytes in memory (estimated size 198.1 MB, free 3.7 GB)
      18/01/12 11:01:56 INFO BlockManagerInfo: Added taskresult_5 in memory on 192.168.179.1:63481 (size: 198.1 MB, free: 3.7 GB)
      18/01/12 11:01:56 INFO Executor: Finished task 5.0 in stage 0.0 (TID 5). 207774348 bytes result sent via BlockManager)
      18/01/12 11:01:57 INFO TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, localhost, executor driver, partition 9, PROCESS_LOCAL, 5290 bytes)
      18/01/12 11:01:57 ERROR TaskSetManager: Total size of serialized results of 6 tasks (1188.5 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)
      18/01/12 11:01:57 INFO Executor: Running task 9.0 in stage 0.0 (TID 9)
      18/01/12 11:01:57 INFO BlockManagerInfo: Removed taskresult_5 on 192.168.179.1:63481 in memory (size: 198.1 MB, free: 3.9 GB)
      18/01/12 11:01:57 INFO FileScanRDD: Reading File path: file:///C:/Users/test/Desktop/copy/SaleLine/SaleLine, range: 1207959552-1342177280, partition values: [empty row]
      18/01/12 11:01:57 INFO TaskSchedulerImpl: Cancelling stage 0
      18/01/12 11:01:57 INFO TaskSchedulerImpl: Stage 0 was cancelled
      18/01/12 11:01:57 INFO Executor: Executor is trying to kill task 9.0 in stage 0.0 (TID 9), reason: stage cancelled
      18/01/12 11:01:57 INFO Executor: Executor is trying to kill task 6.0 in stage 0.0 (TID 6), reason: stage cancelled
      18/01/12 11:01:57 INFO Executor: Executor killed task 9.0 in stage 0.0 (TID 9), reason: stage cancelled
      18/01/12 11:01:57 INFO DAGScheduler: ResultStage 0 (collect at C:/Users/test/Desktop/Test123.py:15) failed in 47.966 s due to Job aborted due to stage failure: Total size of serialized results of 6 tasks (1188.5 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)
      18/01/12 11:01:57 INFO MemoryStore: Block taskresult_7 stored as bytes in memory (estimated size 198.1 MB, free 3.7 GB)
      18/01/12 11:01:57 INFO Executor: Executor is trying to kill task 7.0 in stage 0.0 (TID 7), reason: stage cancelled
      18/01/12 11:01:57 INFO BlockManagerInfo: Added taskresult_7 in memory on 192.168.179.1:63481 (size: 198.1 MB, free: 3.7 GB)
      18/01/12 11:01:57 INFO Executor: Executor is trying to kill task 8.0 in stage 0.0 (TID 8), reason: stage cancelled
      18/01/12 11:01:57 INFO Executor: Finished task 7.0 in stage 0.0 (TID 7). 207695395 bytes result sent via BlockManager)
      18/01/12 11:01:57 WARN TaskSetManager: Lost task 9.0 in stage 0.0 (TID 9, localhost, executor driver): TaskKilled (stage cancelled)
      18/01/12 11:01:57 INFO DAGScheduler: Job 0 failed: collect at C:/Users/test/Desktop/Test123.py:15, took 48.556068 s
      18/01/12 11:01:57 INFO Executor: Executor killed task 8.0 in stage 0.0 (TID 8), reason: stage cancelled
      18/01/12 11:01:57 ERROR TaskSetManager: Total size of serialized results of 7 tasks (1386.5 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)
      18/01/12 11:01:57 WARN TaskSetManager: Lost task 8.0 in stage 0.0 (TID 8, localhost, executor driver): TaskKilled (stage cancelled)
      18/01/12 11:01:57 INFO BlockManagerInfo: Removed taskresult_7 on 192.168.179.1:63481 in memory (size: 198.1 MB, free: 3.9 GB)
      Traceback (most recent call last):
        File "C:/Users/test/Desktop/Test123.py", line 15, in <module>
          lines.collect()
        File "D:\workspace\spark-2.2.0-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\rdd.py", line 809, in collect
        File "D:\workspace\spark-2.2.0-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\java_gateway.py", line 1133, in __call__
        File "D:\workspace\spark-2.2.0-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\sql\utils.py", line 63, in deco
        File "D:\workspace\spark-2.2.0-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\protocol.py", line 319, in get_return_value
      py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
      : org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 6 tasks (1188.5 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)
              at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
              at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
              at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
              at scala.collection.m18/01/12 11:01:57 INFO MemoryStore: Block taskresult_6 stored as bytes in memory (estimated size 198.1 MB, free 3.5 GB)
      utable.ResizableArray$class.foreach(ResizableArray.scala:59)
              at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
              at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
              at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
              at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
              at scala.Option.foreach(Option.scala:257)
              at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
              at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
              at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
              at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
              at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
              at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
              at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
              at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
              at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)
              at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087)
              at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
              at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
              at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
              at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
              at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
              at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:458)
              at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.ref18/01/12 11:01:57 INFO BlockManagerInfo: Added taskresult_6 in memory on 192.168.179.1:63481 (size: 198.1 MB, free: 3.7 GB)
      lect.Method.invoke(Method.java:483)
              at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
              at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
              at py4j.Gateway.invoke(Gateway.java:280)
              at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
              at py4j.commands.CallCommand.execute(CallCommand.java:79)
              at py4j.GatewayConnection.run(GatewayConnection.java:214)
              at java.lang.Thread.run(Thread.java:745)
      
      18/01/12 11:01:57 INFO SparkContext: Invoking stop() from shutdown hook
      18/01/12 11:01:57 INFO Executor: Finished task 6.0 in stage 0.0 (TID 6). 207681096 bytes result sent via BlockManager)
      18/01/12 11:01:57 ERROR TaskSetManager: Total size of serialized results of 8 tasks (1584.6 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)
      18/01/12 11:01:57 INFO SparkUI: Stopped Spark web UI at http://192.168.179.1:4040
      18/01/12 11:01:57 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
      18/01/12 11:01:57 INFO BlockManagerInfo: Removed taskresult_6 on 192.168.179.1:63481 in memory (size: 198.1 MB, free: 3.9 GB)
      18/01/12 11:01:57 ERROR Utils: Uncaught exception in thread task-result-getter-1
      java.lang.InterruptedException
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
              at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
              at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
              at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
              at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
              at org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:105)
              at org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:642)
              at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:82)
              at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
              at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
              at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1954)
              at org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:62)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      Exception in thread "task-result-getter-1" java.lang.Error: java.lang.InterruptedException
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1148)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.InterruptedException
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
              at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
              at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
              at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
              at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
              at org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:105)
              at org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:642)
              at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:82)
              at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
              at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
              at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1954)
              at org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:62)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              ... 2 more
      18/01/12 11:01:57 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(driver, 192.168.179.1, 63481, None),taskresult_6,StorageLevel(1 replicas),0,0))
      18/01/12 11:01:58 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
      18/01/12 11:01:58 ERROR TransportRequestHandler: Error sending result ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=882113328004, chunkIndex=0}, buffer=org.apache.spark.storage.BlockManagerManagedBuffer@f00f35c} to /192.168.179.1:63484; closing connection
      java.nio.channels.ClosedChannelException
              at io.netty.channel.AbstractChannel$AbstractUnsafe.close(...)(Unknown Source)
      18/01/12 11:01:58 INFO MemoryStore: MemoryStore cleared
      18/01/12 11:01:58 INFO BlockManager: BlockManager stopped
      18/01/12 11:01:58 INFO BlockManagerMaster: BlockManagerMaster stopped
      18/01/12 11:01:58 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from /192.168.179.1:63481 is closed
      18/01/12 11:01:58 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
      18/01/12 11:01:58 INFO RetryingBlockFetcher: Retrying fetch (1/3) for 1 outstanding blocks after 5000 ms
      18/01/12 11:01:58 INFO SparkContext: Successfully stopped SparkContext
      18/01/12 11:01:58 INFO ShutdownHookManager: Shutdown hook called
      18/01/12 11:01:58 INFO ShutdownHookManager: Deleting directory C:\Users\test\AppData\Local\Temp\spark-0c5cf93c-e443-4ab1-b2ea-58e1b91fa310\pyspark-90628145-a2e4-437f-bbe5-ef41ed4ba64f
      18/01/12 11:01:58 INFO ShutdownHookManager: Deleting directory C:\Users\test\AppData\Local\Temp\spark-0c5cf93c-e443-4ab1-b2ea-58e1b91fa310
      

      您可以调整 spark.driver.maxResultSize 值,如spark-submit --driver-memory 2g --conf "spark.driver.maxResultSize=2g" Test.py

      即使您遇到错误,您也可以从该命令中分享详细信息。 spark-submit --driver-memory 10g --conf "spark.driver.maxResultSize=2g" Test.py 以便我们查看您平台的具体情况。

      【讨论】:

      • 感谢您的回复,即使我使用 pyspark spark-2.2.0,我也尝试了所有选项,通过增加驱动程序和执行程序内存仍然出现错误。
      • 修改了答案。你可以检查一下。
      • 你可以在启动 spark-submit 并共享完整输出时添加参数 -v(用于详细输出)吗?
      • 添加了详细输出,请看一下。
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-08-08
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多