【问题标题】:spark application java.lang.OutOfMemoryError: Direct buffer memoryspark应用程序java.lang.OutOfMemoryError:直接缓冲内存
【发布时间】:2016-01-21 11:45:02
【问题描述】:
  1. 我正在使用以下运行时 spark 配置值

spark-submit --executor-memory 8G --spark.yarn.executor.memoryOverhead 2G

但它仍然会引发以下内存不足错误:

我有一个 pairRDD,它有 8362269460 行,分区大小为 128。当 pairRDD.groupByKey.saveAsTextFile 时会引发此错误。有什么线索吗?

更新: 我添加了一个过滤器,现在数据行是 2300000000。在 spark shell 中运行,没有错误。 我的集群: 19 日期节点 1 名称节点

             Min Resources: <memory:150000, vCores:150>
             Max Resources: <memory:300000, vCores:300>

感谢您的帮助。

org.apache.spark.shuffle.FetchFailedException: java.lang.OutOfMemoryError: Direct buffer memory
  at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:321)
  at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:306)
  at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:51)
  at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
  at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
  at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
  at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
  at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:132)
  at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:60)
  at org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:89)
  at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:90)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
  at org.apache.spark.scheduler.Task.run(Task.scala:88)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
Caused by: io.netty.handler.codec.DecoderException:  Direct buffer memory
  at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:234)
  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
  at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
  at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
  at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
  at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
  at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
  at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
  at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
  ... 1 more
Caused by: java.lang.OutOfMemoryError: Direct buffer memory
  at java.nio.Bits.reserveMemory(Bits.java:658)
  at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
  at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
  at io.netty.buffer.PoolArena$DirectArena.newUnpooledChunk(PoolArena.java:651)
  at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:237)
  at io.netty.buffer.PoolArena.allocate(PoolArena.java:215)
  at io.netty.buffer.PoolArena.reallocate(PoolArena.java:358)
  at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:121)
  at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:251)
  at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:849)
  at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:841)
  at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:831)
  at io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:92)
  at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:228)
  ... 10 more
)

我想知道如何正确配置直接内存大小。 最好的问候

【问题讨论】:

  • 请正确格式化您的问题并提供一些背景信息
  • @ssyue -XX:MaxDirectMemorySize
  • @manRo 对不起,英语是我的弱点。
  • @Marek-A- 谢谢,但是如何在 spark 应用程序上设置它?
  • 发布 spark-defaults.conf 文件,它将为问题提供上下文,并正确使用垃圾收集。应该使用 G1GC。

标签: java apache-spark out-of-memory


【解决方案1】:

我不知道任何关于spark app的细节,但我找到了内存配置here 您需要设置 -XX:MaxDirectMemorySize 与任何其他 JVM 内存类似。设置(超过-XX:) 尝试使用spark.executor.extraJavaOptions

如果您使用的是spark-submit,您可以使用:

./bin/spark-submit --name "My app" ...
  --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:MaxDirectMemorySize=512m" myApp.jar

【讨论】:

  • 但是这个内存错误意味着您的应用程序有任何内存问题,例如您将整个流内容读取到内存缓冲区
  • @ravindra 我有一个 pairRDD,它有 8362269460 行(例如:(867196025682574,(A10000456C2DA1,0.0010017530678687703)))并且分区大小为 128。当 pairRDD.groupByKey.saveAsText 时会引发此错误
  • 如前所述,还有其他人在这里建议,请发布代码和 spark-defaults.conf 文件。您的 cmets 没有为问题提供足够的上下文
  • @ssyue - 您在参数中使用的值需要与您在直接内存中处理的数据量相对应,如果数据量很大,它仍然可能引发同样的问题。请发布您的代码,解决方案将在处理算法的变化。
猜你喜欢
  • 2017-02-20
  • 2020-10-16
  • 2017-08-11
  • 1970-01-01
  • 1970-01-01
  • 2015-12-03
  • 2021-08-24
  • 2020-04-25
  • 2017-11-21
相关资源
最近更新 更多