mapper 和 reducer 函数的输出到底是什么答案

【问题标题】：What exactly is output of mapper and reducer functionmapper 和 reducer 函数的输出到底是什么
【发布时间】：2016-05-07 20:05:30
【问题描述】：

这是Extracting rows containing specific value using mapReduce and hadoop的后续问题
Mapper函数

public static class MapForWordCount extends Mapper<Object, Text, Text, IntWritable>{

private IntWritable saleValue = new IntWritable();
private Text rangeValue = new Text();

public void map(Object key, Text value, Context con) throws IOException, InterruptedException
{
    String line = value.toString();
    String[] words = line.split(",");
    for(String word: words )
    {
        if(words[3].equals("40")){  
            saleValue.set(Integer.parseInt(words[0]));
            rangeValue.set(words[3]);
            con.write( rangeValue , saleValue );
        }
    }
}   
}

减速器功能

public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text, IntWritable>  
{  
    private IntWritable result = new IntWritable();  
    public void reduce(Text word, Iterable<IntWritable> values, Context con) throws IOException, InterruptedException  
    {  
        for(IntWritable value : values)  
        {  
            result.set(value.get());  
            con.write(word, result);  
        }  
    }  
}

得到的输出是

编辑 1： 但预期的输出是

40 102  
40 104  
40 105

我做错了什么？

mapper 和 reducer 函数到底发生了什么？

【问题讨论】：

您正在写出键值对...您还想知道什么？
感谢@cricket_007 的建议，我一定会尝试...我实际上想知道mapper 返回和reducer 到底做了什么- 接受和打印。
当您extends 他们时，两个类的顺序都是<KeyIn, ValueIn, KeyOut, ValueOut>。而且mapper的输出key-value必须和reducer的输入key-value匹配
提供更多信息 - 映射器正在使用上下文对象将值写入减速器（而不是“返回”），并且减速器将值发送到输出（再次使用上下文 - 而不是通过“返回”）。映射器将具有相同“键”的所有值“发送”到同一个减速器（这实际上发生在 shuffle 阶段），因此每个减速器将在一组具有相同键的值上“运行”。
感谢@It-Z，这正是我想要的。

标签： hadoop mapreduce hadoop2 feature-extraction mapper

【解决方案1】：

在original question 的上下文中 - 您不需要在映射器或reducer 中都没有循环，因为您正在复制条目：

public static class MapForWordCount extends Mapper<Object, Text, Text, IntWritable>{

private IntWritable saleValue = new IntWritable();
private Text rangeValue = new Text();

public void map(Object key, Text value, Context con) throws IOException, InterruptedException
{
    String line = value.toString();
    String[] words = line.split(",");
    if(words[3].equals("40")){  
       saleValue.set(Integer.parseInt(words[0]));
       rangeValue.set(words[3]);
       con.write(rangeValue , saleValue );
    }
}   
}

在减速器中，正如@Serhiy 在原始问题中所建议的，您只需要一行代码：

public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text, IntWritable>  
{  
private IntWritable result = new IntWritable();  
public void reduce(Text word, Iterable<IntWritable> values, Context con) throws IOException, InterruptedException  
{  
    con.write(word, null);  
}

重新评分“编辑 1” - 我将把它留作一个简单的练习 :)

【讨论】：

您可以参考@cricket_007 的回答，了解您复制条目的方式。

【解决方案2】：

到底发生了什么

您正在使用逗号分隔的文本行、拆分逗号并过滤掉一些值。 con.write() 如果您所做的只是提取这些值，则每行只应调用一次。

映射器将对您输出的所有“40”个键进行分组，并形成一个使用该键写入的所有值的列表。这就是减速器正在阅读的内容。

您可能应该为您的地图功能尝试这个。

// Set the values to write 
saleValue.set(Integer.parseInt(words[0]));
rangeValue.set(words[3]);

// Filter out only the 40s
if(words[3].equals("40")) {
    // Write out "(40, safeValue)" words.length times 
    for(String word: words )
    {
        con.write( rangeValue , saleValue );
    }
}

如果您不希望拆分字符串的长度出现重复值，请摆脱 for 循环。

你的 reducer 所做的只是打印出它从映射器收到的内容。

【讨论】：

【解决方案3】：

映射器输出将是这样的：

<word,count>

Reducer 的输出是这样的：

<unique word, its total count>

例如：读取一行并计算其中的所有单词并将其放入<key,value> 对中：

<40,1>
<140,1>
<50,1>
<40,1> ..

这里 40,50,140, .. 都是键，值是该键在一行中出现的次数。这发生在映射器中。

然后，这些key,valuepairs 被发送到reducer，其中相似的键都被简化为一个key，并且与该键关联的所有值被求和以赋予键值对的值。所以，reducer 的结果会是这样的：

<40,10>
<50,5>
...

在你的情况下，reducer 没有做任何事情。映射器找到的唯一值/单词只是作为输出给出。

理想情况下，您应该减少并获得如下输出：“40,150”在同一行被找到 5 次。

【讨论】：