【问题标题】:GC overhead while running pig job, after hadoop job ends在hadoop作业结束后运行pig作业时的GC开销
【发布时间】:2019-09-11 07:53:25
【问题描述】:

我正在运行一个非常简单的猪脚本(猪 0.14,Hadoop 2.4):

customers = load '/some/hdfs/path' using SomeUDFLoader();
customers2 = foreach (group customers by customer_id) generate FLATTEN(group) as customer_id, MIN(dw_customer.date) as date;
store customers2 into '/hdfs/output' using PigStorage(',');

这会启动约 60000 个映射器和 999 个缩减器的 map-reduce 作业。

ma​​p-reduce 作业完成后它的工作(我知道是因为输出已经写入,并且作业管理器说作业成功),有很长的停顿,我得到了猪输出中的以下错误:

2015-11-24 11:45:29,394 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at *********
2015-11-24 11:45:29,403 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2015-11-24 11:46:03,533 [Service Thread] INFO  org.apache.pig.impl.util.SpillableMemoryManager - first memory handler call- Usage threshold init = 698875904(682496K) used = 520031456(507843K) committed = 698875904(682496K) max = 698875904(682496K)
2015-11-24 11:46:04,473 [Service Thread] INFO  org.apache.pig.impl.util.SpillableMemoryManager - first memory handler call - Collection threshold init = 698875904(682496K) used = 575405920(561919K) committed = 698875904(682496K) max = 698875904(682496K)
2015-11-24 11:47:36,255 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. GC overhead limit exceeded

堆栈跟踪看起来像(每次异常都是另一个函数):

Pig Stack Trace
---------------
ERROR 2998: Unhandled internal error. Java heap space

java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapreduce.v2.api.records.impl.pb.CounterGroupPBImpl.initCounters(CounterGroupPBImpl.java:136)
    at org.apache.hadoop.mapreduce.v2.api.records.impl.pb.CounterGroupPBImpl.getAllCounters(CounterGroupPBImpl.java:121)
    at org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:240)
    at org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:367)
    at org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:388)
    at org.apache.hadoop.mapred.ClientServiceDelegate.getTaskReports(ClientServiceDelegate.java:448)
    at org.apache.hadoop.mapred.YARNRunner.getTaskReports(YARNRunner.java:551)
    at org.apache.hadoop.mapreduce.Job$3.run(Job.java:533)
    at org.apache.hadoop.mapreduce.Job$3.run(Job.java:531)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
    at org.apache.hadoop.mapreduce.Job.getTaskReports(Job.java:531)
    at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.getTaskReports(HadoopShims.java:235)
    at org.apache.pig.tools.pigstats.mapreduce.MRJobStats.addMapReduceStatistics(MRJobStats.java:352)
    at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.addSuccessJobStats(MRPigStatsUtil.java:233)
    at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.accumulateStats(MRPigStatsUtil.java:165)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:360)
    at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:280)
    at org.apache.pig.PigServer.launchPlan(PigServer.java:1390)
    ...

我在 pig 脚本中的一组 SET 语句:

SET mapreduce.map.java.opts '-server -Xmx6144m -Djava.net.preferIPv4Stack=true -Duser.timezone=UTC'
SET mapreduce.reduce.java.opts '-server -Xmx6144m -Djava.net.preferIPv4Stack=true -Duser.timezone=UTC'
SET mapreduce.map.memory.mb '8192'
SET mapreduce.reduce.memory.mb '8192'
SET mapreduce.map.speculative 'true'
SET mapreduce.reduce.speculative 'true'
SET mapreduce.jobtracker.maxtasks.perjob '100000'
SET mapreduce.job.split.metainfo.maxsize '-1'

为什么会发生这种情况,我该如何解决?

提前感谢您的帮助。

【问题讨论】:

    标签: java hadoop garbage-collection apache-pig


    【解决方案1】:

    看起来这是在您的应用程序管理器中引起的,因为您提到在执行所有映射器/减速器后返回错误。尝试增加应用程序管理器的内存。

    在 YARN 集群中,您可以使用以下两个属性来控制 ApplicationMaster 可用的内存量:

    1. yarn.app.mapreduce.am.command-opts

    2. yarn.app.mapreduce.am.resource.mb

    同样,您可以将 -Xmx(在前者中)设置为 resource.mb 值的 75%。

    有关参数的详细信息可以找到here

    【讨论】:

    • 刚刚编辑了我的答案。如果您认为我应该添加更多详细信息,请告诉我。
    猜你喜欢
    • 2016-12-05
    • 1970-01-01
    • 1970-01-01
    • 2021-07-16
    • 2011-11-20
    • 2023-03-22
    • 2016-09-29
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多