【发布时间】:2015-08-19 11:20:00
【问题描述】:
我在 Hadoop 中流式传输两个脚本 wordCountMap.pl 和 wordCountReduce.pl,它们应该计算文件中每个单词的出现次数。
但 Hadoop 一直抱怨 wordCountMap.pl。我的命令和输出如下。
命令:
hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar -input wordCount/words.txt -output output -mapper wordCount/wordCountMap.pl -file wordCount/wordCountMap.pl -reducer wordCount/wordCuntReduce.pl -file wordCount/wordCountReduce.pl
输出:
15/08/18 20:09:50 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
15/08/18 20:09:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
File: /home/hduser/wordCount/wordCountMap.pl does not exist, or is not readable.
Try -help for more information
Streaming Command Failed!
但是 wordCountMap.pl 很好(对我来说),因为我输入了:
hadoop fs -cat wordCount/wordCountMap.pl
得到:
15/08/18 20:21:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
#!/usr/bin/perl -w
while(<STDIN>) {
chomp;
@words = split;
foreach $w (@words) {
$key = $w;
$value = "1";
print "$key\t$value\n";
}
}
谁能告诉我我的命令有什么问题? (我认为我们可以放心地忽略上面的 WARN 消息。)
仅供参考,wordCountReduce.pl 是
#!/usr/bin/perl -w
$count = 0;
while(<STDIN>) {
chomp;
($key,$value) = split "\t";
if (!defined($oldkey)) {
$oldkey = $key;
$count = $value;
} else {
if ($oldkey eq $key) {
$count = $count + $value;
} else {
print "$oldkey\t$count\n";
$oldkey = $key;
$count = $value;
}
}
}
print "$oldkey\t$count\n";
和words.txt
a a b
b c
a
“hadoop fs -ls wordCount”的结果是
15/08/18 21:27:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 3 items
-rw-r--r-- 1 hduser supergroup 145 2015-08-18 20:04 wordCount/wordCountMap.pl
-rw-r--r-- 1 hduser supergroup 346 2015-08-18 20:04 wordCount/wordCountReduce.pl
-rw-r--r-- 1 hduser supergroup 12 2015-08-18 20:04 wordCount/words.txt
提前谢谢你!
【问题讨论】:
-
将脚本保存在本地文件系统中