【问题标题】:Apache Pig - Illustrate command errorApache Pig - 说明命令错误
【发布时间】:2014-10-07 11:22:16
【问题描述】:
]$ cat webccess.txt
mark,yahoo.com,6
sam,google.com,7
john,yahoo.com,3
patrick,cnn.com,8
mary,facebook.com,1
mark,yahoo.com,4
john,bbc.com,10
andrew,twitter.com,3
patrick,twitter.com,9

我在 Cloudera Quick Vm Hue-Pig Shell(Grunt) 中运行以下任务

grunt> stage1 = LOAD '/user/cloudera/webaccess.txt' USING PigStorage(',') AS (name:chararray, website:chararray, access:int);
grunt> DUMP stage1;
grunt> stage2 = FILTER stage1 by access >= 8;
grunt> stage3 = GROUP stage1 by name;
grunt> stage4 = FOREACH stage3 GENERATE group as GROUPS, MAX(stage1.access);
grunt> DUMP stage4;

输出:

(sam,7)
(john,10)
(mark,6)
(mary,1)
(andrew,3)
(patrick,9)

直到这一切都很好。

当我应用 ILLUSTRATE 命令查看关系 stage4 时,我收到如下所示的错误,

grunt> ILLUSTRATE stage4;

2014-10-07 04:02:43,639 [main] WARN org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-10-07 04:02:43,642 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost.localdomain:8020
2014-10-07 04:02:43,643 [main] WARN org.apache.hadoop.conf.Configuration - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2014-10-07 04:02:43,643 [main] WARN org.apache.hadoop.conf.Configuration - dfs.https.address is deprecated. Instead, use dfs.namenode.https-address
2014-10-07 04:02:43,643 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost.localdomain:8021
2014-10-07 04:02:43,799 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-10-07 04:02:43,800 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2014-10-07 04:02:43,800 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2014-10-07 04:02:43,804 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2014-10-07 04:02:43,805 [main] ERROR org.apache.pig.pen.ExampleGenerator - Error reading data. Internal error creating job configuration.
java.lang.RuntimeException: Internal error creating job configuration.
at org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:160)
at org.apache.pig.PigServer.getExamples(PigServer.java:1182)
at org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:739)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:626)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:323)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:538)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
2014-10-07 04:02:43,868 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Encountered IOException. Exception
Details at logfile: /dev/null

我在学习阶段,由于这个错误,我无法进入下一个主题。

在我第一次打开 Hue-Pig Shell(Grunt) 开始这个任务之前,我发现了以下警告。

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.PlatformName
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: org.apache.hadoop.util.PlatformName. Program will exit.
which: no hadoop in ((null))
which: no /usr/lib/hadoop/bin/hadoop in ((null))
dirname: missing operand
Try `dirname --help' for more information.
2014-10-07 03:18:27,802 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.0-cdh4.7.0 (rexported) compiled May 28 2014, 11:05:48
2014-10-07 03:18:27,803 [main] INFO org.apache.pig.Main - Logging error messages to: /dev/null
2014-10-07 03:18:28,758 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/cloudera/.pigbootup not found
2014-10-07 03:18:30,436 [main] WARN org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-10-07 03:18:30,444 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost.localdomain:8020
2014-10-07 03:18:37,832 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost.localdomain:8021
2014-10-07 03:18:37,842 [main] WARN org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS

【问题讨论】:

    标签: hadoop apache-pig high-level


    【解决方案1】:

    我没有遇到任何问题,说明命令工作正常。可以先尝试本地模式执行吗?

        $pig -x local
        grunt> stage1 = LOAD 'input.txt' USING PigStorage(',') AS (name:chararray, website:chararray, access:int);
        grunt> stage2 = FILTER stage1 by access >= 8;
        grunt> stage3 = GROUP stage1 by name;
        grunt> stage4 = FOREACH stage3 GENERATE group as GROUPS, MAX(stage1.access);
        grunt> DUMP stage4;
        (sam,7)
        (john,10)
        (mark,6)
        (mary,1)
        (andrew,3)
        (patrick,9)
        grunt> ILLUSTRATE stage4;
        ----------------------------------------------------------------------------
        | stage1     | name:chararray     | website:chararray     | access:int     | 
        ----------------------------------------------------------------------------
        |            | john               | yahoo.com             | 3              | 
        |            | john               | bbc.com               | 10             | 
        ----------------------------------------------------------------------------
        --------------------------------------------------------------------------------------------------------------------------
        | stage3     | group:chararray     | stage1:bag{:tuple(name:chararray,website:chararray,access:int)}                     | 
        --------------------------------------------------------------------------------------------------------------------------
        |            | john                | {(john, yahoo.com, 3), (john, bbc.com, 10)}                                         | 
        |            | john                | {(john, yahoo.com, 3), (john, bbc.com, 10)}                                         | 
        --------------------------------------------------------------------------------------------------------------------------
        ------------------------------------------------
        | stage4     | GROUPS:chararray     | :int     | 
        ------------------------------------------------
        |            | john                 | 10       | 
        ------------------------------------------------
    

    【讨论】:

    • 感谢 siva 的回复,我在本地模式下执行了它,但仍然面临错误。请发帖。
    【解决方案2】:

    似乎是类路径问题。请检查类路径中指定的所有必需的 jar。详情请查看this thread

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2013-10-30
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-12-15
      • 1970-01-01
      • 2013-07-29
      相关资源
      最近更新 更多