你能逐步解释一下字数mapreduce程序吗答案

【问题标题】：can you explain word count mapreduce program step by step你能逐步解释一下字数mapreduce程序吗
【发布时间】：2015-08-25 07:26:27
【问题描述】：

你能解释一下任何 map reduce 程序吗？例如在字数统计程序类中的类是内部类。你能一步一步解释这个程序吗？尖括号是什么意思。为什么我们还要写输出参数。什么是上下文对象。像这样你能一步一步地解释这个程序吗？我知道逻辑，但我无法理解一些 Java 语句

public class WordCount {

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
   private final static IntWritable one = new IntWritable(1);
   private Text word = new Text();

   public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
       String line = value.toString();
       StringTokenizer tokenizer = new StringTokenizer(line);
       while (tokenizer.hasMoreTokens()) {
           word.set(tokenizer.nextToken());
           context.write(word, one);
       }
   }
} 

public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

   public void reduce(Text key, Iterable<IntWritable> values, Context context) 
     throws IOException, InterruptedException {
       int sum = 0;
       for (IntWritable val : values) {
           sum += val.get();
       }
       context.write(key, new IntWritable(sum));
   }
}

public static void main(String[] args) throws Exception {
   Configuration conf = new Configuration();

       Job job = new Job(conf, "wordcount");

   job.setOutputKeyClass(Text.class);
   job.setOutputValueClass(IntWritable.class);

   job.setMapperClass(Map.class);
   job.setReducerClass(Reduce.class);

   job.setInputFormatClass(TextInputFormat.class);
   job.setOutputFormatClass(TextOutputFormat.class);

   FileInputFormat.addInputPath(job, new Path(args[0]));
   FileOutputFormat.setOutputPath(job, new Path(args[1]));

   job.waitForCompletion(true);
}

}

【问题讨论】：

标签： java hadoop mapreduce

【解决方案1】：

您的 Map 类扩展了 Hadoop 的 Mapper 类，其中提到了输入和输出参数的泛型。前两个参数是输入键值，后两个参数是输出键值。 Mapper 类需要重写 map() 方法。您的映射器逻辑在这里。该方法接受指定的 Input 参数并返回 void 并将 Key-Value 对写入 Context（内存）。

您的 Reduce 类扩展了 Reducer 类。 Reducer 的输入应该匹配 Mapper/Combiner 的输出 Key-Value。 Reducer 类需要重写 reduce() 方法。你的减速器逻辑在这里。该方法接受指定的 Input 参数并返回 void 并从 Context（内存）中读取 Key-Value 对。

Hadoop 在这两种方法之间进行组合、排序、混洗操作。

您的主要方法包含代码设置 Hadoop 作业。

更多的澄清来自。 macalester.edu 和 javacodegeeks

【讨论】：