【问题标题】:Why Hadoop Streaming cannot find my script?为什么 Hadoop Streaming 找不到我的脚本?
【发布时间】:2015-08-19 11:20:00
【问题描述】:

我在 Hadoop 中流式传输两个脚本 wordCountMap.pl 和 wordCountReduce.pl,它们应该计算文件中每个单词的出现次数。

但 Hadoop 一直抱怨 wordCountMap.pl。我的命令和输出如下。

命令:

hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar -input wordCount/words.txt -output output -mapper wordCount/wordCountMap.pl -file wordCount/wordCountMap.pl -reducer wordCount/wordCuntReduce.pl -file wordCount/wordCountReduce.pl

输出:

15/08/18 20:09:50 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
15/08/18 20:09:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
File: /home/hduser/wordCount/wordCountMap.pl does not exist, or is not readable.
Try -help for more information
Streaming Command Failed!

但是 wordCountMap.pl 很好(对我来说),因为我输入了:

hadoop fs -cat wordCount/wordCountMap.pl

得到:

15/08/18 20:21:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    #!/usr/bin/perl -w
    while(<STDIN>) {
        chomp;
        @words = split;
        foreach $w (@words) {
            $key = $w;
            $value = "1";
            print "$key\t$value\n";
        }
    }

谁能告诉我我的命令有什么问题? (我认为我们可以放心地忽略上面的 WARN 消息。)

仅供参考,wordCountReduce.pl 是

#!/usr/bin/perl -w
$count = 0;
while(<STDIN>) {
    chomp;
    ($key,$value) = split "\t";

    if (!defined($oldkey)) {
        $oldkey = $key;
        $count  = $value;
    } else {
        if ($oldkey eq $key) {
        $count = $count + $value;
        } else {
        print "$oldkey\t$count\n";
        $oldkey = $key;
        $count  = $value;
        }
    }
}
print "$oldkey\t$count\n";

和words.txt

a a b
b c
a

“hadoop fs -ls wordCount”的结果是

15/08/18 21:27:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 3 items
-rw-r--r--   1 hduser supergroup        145 2015-08-18 20:04 wordCount/wordCountMap.pl
-rw-r--r--   1 hduser supergroup        346 2015-08-18 20:04 wordCount/wordCountReduce.pl
-rw-r--r--   1 hduser supergroup         12 2015-08-18 20:04 wordCount/words.txt

提前谢谢你!

【问题讨论】:

  • 将脚本保存在本地文件系统中

标签: perl hadoop


【解决方案1】:

如果您仔细查看了http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/ 上的说明

hduser@ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-*streaming*.jar -file /home/hduser/mapper.py -mapper /home/hduser/mapper.py -file /home/hduser/reducer.py -reducer /home/hduser/reducer.py -input /user/hduser/gutenberg/* -output /user/hduser/gutenberg-output

它清楚地表明不需要将 mapper.py 和 reducer.py 复制到 HDFS,您可以从本地文件系统链接这两个文件:作为 /path/to/mapper。我相信你可以避免上述错误。

【讨论】:

  • 您能否给出“从本地文件系统链接两个文件:作为/path/to/mapper”的具体命令?我实际上在另一篇文章中看到了这个答案,但我不知道该怎么做。
  • 链接到本地​​文件系统不起作用。我测试了“hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar -input words.txt -output output -mapper /path/to/wordCount -file /path/ to/wordCountMap.pl -reducer /path/to/wordCountReduce.pl -file /path/to/wordCountReduce.pl”和 Hadoop 给出“15/08/19 00:48:30 ERROR streaming.PipeMapRed: configuration exception java.io .IOException:无法运行程序“/path/to”:错误=13,权限被拒绝”。但我确实更改了 chmod 777 /path/to。我错过了什么吗?
  • 我终于想通了。命令应该是:hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar -input words.txt -output output -mapper wordCountMap.pl -file /local_path/to /wordCountMap.pl -reducer wordCountReduce.pl -file /local_path/to/wordCountReduce.pl.
猜你喜欢
  • 1970-01-01
  • 2017-01-12
  • 2019-11-16
  • 1970-01-01
  • 2021-04-29
  • 2014-12-23
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多