执行 PIG 脚本时出错答案

【问题标题】：Error when executing PIG script执行 PIG 脚本时出错
【发布时间】：2015-10-22 07:17:48
【问题描述】：

我正在谷歌云 Hadoop 环境中运行猪脚本 pig -useHCatalog -x mapreduce -f profile.pig 我有两个表，每个表有 50,000 条记录，它们将被交叉并与一个有 10,00,000 条的表连接。我运行相同的脚本，但记录较少，它运行得很好，但是当我增加记录数时，它会引发此错误。

    2015-10-22 05:38:56,261 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at hadoop-w-0.c.horton-cluster-3.internal/10.240.0.3:8050
2015-10-22 05:38:56,266 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=FAILED. Redirecting to job history serv
er
2015-10-22 05:38:56,377 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exceptio
n from backed error: Container killed on request. Exit code is 137
Container exited with a non-zero exit code 137
Killed by external signal
2015-10-22 05:38:56,377 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2015-10-22 05:38:56,380 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics: 
HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
2.6.0.2.2.8.0-3150      0.14.0.2.2.8.0-3150     hdfs    2015-10-22 05:34:17     2015-10-22 05:38:56     HASH_JOIN,GROUP_BY,FILTER,CROSS,UNION
Some jobs have failed! Stop running all dependent jobs
Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTime      AvgMapTime      MedianMapTime   MaxReduceTime   MinReduceTime   AvgReduceTime   MedianReducetime        Alias   Feature Outp
uts
job_1444401341866_0534  1       0       11      11      11      11      0       0       0       0       t_female,t_male,t_profile       MULTI_QUERY,MAP_ONLY
job_1444401341866_0535  2       1       6       6       6       6       91      91      91      91      t_female,t_male,t_raid_female,t_raid_female1
job_1444401341866_0536  2       1       25      25      25      25      89      89      89      89      t_female,t_male,t_raid_male,t_raid_male1
job_1444401341866_0537  2       0       5       5       5       5       0       0       0       0       t_female,t_male,t_mf_union      MAP_ONLY
Failed Jobs:
JobId   Alias   Feature Message Outputs
job_1444401341866_0538  j_ci1,j_mf_education,j_mf_height,j_mf_occupation,j_mf_religion,j_mf_weight,t_ci1,t_mf_transpose,t_mf_union,t_raid       HASH_JOIN       Message: Job failed!
Input(s):
Successfully read 10001 records (1509451 bytes) from: "matrimony.profile_gce_limit"Input(s):
Successfully read 10001 records (1509451 bytes) from: "matrimony.profile_gce_limit"

Output(s):

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_1444401341866_0534  ->      job_1444401341866_0535,job_1444401341866_0536,job_1444401341866_0537,
job_1444401341866_0535  ->      job_1444401341866_0538,
job_1444401341866_0536  ->      job_1444401341866_0538,
job_1444401341866_0537  ->      job_1444401341866_0538,
job_1444401341866_0538  ->      null,
null    ->      null,
null    ->      null,
null


2015-10-22 05:38:56,455 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://hadoop-w-0.c.horton-cluster-3.internal:8188/ws/v1/timeline/
2015-10-22 05:38:56,456 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at hadoop-w-0.c.horton-cluster-3.internal/10.240.0.3:8050
2015-10-22 05:38:56,459 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2015-10-22 05:38:56,572 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://hadoop-w-0.c.horton-cluster-3.internal:8188/ws/v1/timeline/
2015-10-22 05:38:56,572 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at hadoop-w-0.c.horton-cluster-3.internal/10.240.0.3:8050
2015-10-22 05:38:56,576 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2015-10-22 05:38:56,675 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://hadoop-w-0.c.horton-cluster-3.internal:8188/ws/v1/timeline/
2015-10-22 05:38:56,676 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at hadoop-w-0.c.horton-cluster-3.internal/10.240.0.3:8050
2015-10-22 05:38:56,679 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2015-10-22 05:38:56,780 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://hadoop-w-0.c.horton-cluster-3.internal:8188/ws/v1/timeline/
2015-10-22 05:38:56,780 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at hadoop-w-0.c.horton-cluster-3.internal/10.240.0.3:8050
2015-10-22 05:38:56,783 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2015-10-22 05:38:56,883 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://hadoop-w-0.c.horton-cluster-3.internal:8188/ws/v1/timeline/
2015-10-22 05:38:56,883 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at hadoop-w-0.c.horton-cluster-3.internal/10.240.0.3:8050
2015-10-22 05:38:56,886 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2015-10-22 05:38:56,981 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://hadoop-w-0.c.horton-cluster-3.internal:8188/ws/v1/Backend error message
timeline/
2015-10-22 05:38:56,982 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at hadoop-w-0.c.horton-cluster-3.internal/10.240.0.3:8050
2015-10-22 05:38:56,985 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history s
erver
2015-10-22 05:38:57,083 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://hadoop-w-0.c.horton-cluster-3.internal:8188/ws/v1/
timeline/
2015-10-22 05:38:57,083 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at hadoop-w-0.c.horton-cluster-3.internal/10.240.0.3:8050
2015-10-22 05:38:57,086 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history s
erver
2015-10-22 05:38:57,182 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://hadoop-w-0.c.horton-cluster-3.internal:8188/ws/v1/
timeline/
2015-10-22 05:38:57,182 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at hadoop-w-0.c.horton-cluster-3.internal/10.240.0.3:8050
2015-10-22 05:38:57,185 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history s
erver
2015-10-22 05:38:57,275 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://hadoop-w-0.c.horton-cluster-3.internal:8188/ws/v1/
timeline/
2015-10-22 05:38:57,275 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at hadoop-w-0.c.horton-cluster-3.internal/10.240.0.3:8050
2015-10-22 05:38:57,278 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history s
erver
2015-10-22 05:38:57,370 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://hadoop-w-0.c.horton-cluster-3.internal:8188/ws/v1/
timeline/
2015-10-22 05:38:57,370 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at hadoop-w-0.c.horton-cluster-3.internal/10.240.0.3:8050
2015-10-22 05:38:57,373 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history s
erver
2015-10-22 05:38:57,475 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://hadoop-w-0.c.horton-cluster-3.internal:8188/ws/v1/
timeline/
2015-10-22 05:38:57,475 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at hadoop-w-0.c.horton-cluster-3.internal/10.240.0.3:8050
2015-10-22 05:38:57,478 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history s
erver
2015-10-22 05:38:57,570 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://hadoop-w-0.c.horton-cluster-3.internal:8188/ws/v1/
timeline/
2015-10-22 05:38:57,570 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at hadoop-w-0.c.horton-cluster-3.internal/10.240.0.3:8050
2015-10-22 05:38:57,574 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history s
erver
2015-10-22 05:38:57,601 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
2015-10-22 05:38:57,602 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to recreate exception from backed error: Container killed on request. Exit code is 
137
Container exited with a non-zero exit code 137
Killed by external signal
Details at logfile: /home/hdfs/workfile/pig_1445492051329.log
2015-10-22 05:38:57,603 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job failed, hadoop does not return any error message
Details at logfile: /home/hdfs/workfile/pig_1445492051329.log
2015-10-22 05:38:57,603 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job failed, hadoop does not return any error message
Details at logfile: /home/hdfs/workfile/pig_1445492051329.log
2015-10-22 05:38:57,603 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job failed, hadoop does not return any error message
Details at logfile: /home/hdfs/workfile/pig_1445492051329.log
2015-10-22 05:38:57,623 [main] INFO  org.apache.pig.Main - Pig script completed in 4 minutes, 46 seconds and 401 milliseconds (286401 ms)


And this is what it is in the log file

    ================================================================================
Pig Stack Trace
---------------
ERROR 2244: Job failed, hadoop does not return any error message
org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job failed, hadoop does not return any error message
        at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:179)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:234)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
        at org.apache.pig.Main.run(Main.java:495)
        at org.apache.pig.Main.main(Main.java:170)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
================================================================================
Pig Stack Trace
---------------
ERROR 2244: Job failed, hadoop does not return any error message
org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job failed, hadoop does not return any error message
        at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:179)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:234)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
        at org.apache.pig.Main.run(Main.java:495)
        at org.apache.pig.Main.main(Main.java:170)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
================================================================================

【问题讨论】：

这可能是您作业的内存分配问题。您是否能够设置 mapreduce java 选项以提供更多内存？如果您能够在 MR 上看到作业日志，您可能会收到有关正在发生的事情的更具体的消息。
嘿@JasonS，我是hadoop的新手，我只是在堆栈溢出的帮助下才走到这一步。你能更具体地说明该怎么做吗？
我运行为 pig -useHCatalog -x local script.pig 它给了.java.lang.Exception: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device在 org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) 在 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) 原因：org.apache.hadoop .fs.FSError: java.io.IOException: org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:248) java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82 设备上没有剩余空间)
Maharaj：我只是想知道为什么我们会看到这么多工作：job_1444401341866_0534 job_1444401341866_0535 job_1444401341866_0536 job_1444401341866_0537 在 Job Stats 中。你能提供更多信息吗？而您在上面发布的错误：设备上没有剩余空间”只会在您的本地服务器空间几乎已满时发生。
所以问题是我正在加载两个表 table1 有 100000 条记录而 table2 在表一中有 1000000 条记录我做了非旋转加入做一些基本的配置单元的东西然后加入 table2 然后聚合它们，当我对较小的数据集执行相同的例程时，它可以正常工作，但对于较大的数据集，情况并非如此

标签： hadoop mapreduce apache-pig

【解决方案1】：

就我而言，它是OrcStorage('\t')。

我到OrcStorage() 成功了。

【讨论】：