map reduce 计数器的条件来控制 map 输出答案

【问题标题】：Condition on map reduce counters to control the map outputmap reduce 计数器的条件来控制 map 输出
【发布时间】：2015-10-08 10:28:30
【问题描述】：

是否有机会在映射器级别对用户定义的 java 计数器设置条件来控制映射器输出？？

       Long l = context.getCounter(Counters.COUNT).getValue();

        if(5L >= l) {
            context.getCounter(Counters.COUNT).increment(1);
            context.write((LongWritable)key, value);
        } else {
            System.out.println("MAP ELSE");
            return;
        }

将超过 5 条记录输入到 reducer。有没有机会控制这个？？？？

【问题讨论】：

只有在所有映射器完成后才能知道计数器的值。您如何获得地图端的计数器值？您的具体要求是什么？
我正在使用名为 Counters.COUNT 的用户定义计数器。在这个计数器的帮助下，我想控制我的地图输出。关于控制 mapoutput [总 mapoutput 记录的数量最多为 5]。甚至将我的 context.write 函数调用放在“if 条件”中，但我从 mapper 获得了 5 条以上的记录作为减速器的输入。所以我的意图是，如果该计数器值达到 MAX 值（5），我想调整地图阶段。

标签： java hadoop mapreduce counter

【解决方案1】：

您不能这样做，如果您的输入文件有 3 个拆分，那么您将运行 3 个映射器。每个映射器都有其单独的计数值（取决于如何增加计数值的逻辑），并且只有在所有映射器在 shuffle 阶段之后完成后，才会在 reduce 端知道。

如果您想限制地图输出。然后有一个减速器job.setNumReduceTasks(1) 并限制减速器的输出。像这样。

public static class WLReducer2 extends
        Reducer<IntWritable, Text, Text, IntWritable> {
    int count=0;
    @Override
    protected void reduce(IntWritable key, Iterable<Text> values,
            Context context) throws IOException, InterruptedException {

        for (Text x : values) {
            if (count < 5)
            context.write(key, x);
            count++;
        }

    };
}

如果你想在减少端获取计数器值。您可以将其添加到 reduce 设置方法中。

 @Override
    public void setup(Context context) throws IOException, InterruptedException{
        Configuration conf = context.getConfiguration();
        Cluster cluster = new Cluster(conf);
        Job currentJob = cluster.getJob(context.getJobID());
        mapperCounter = currentJob.getCounters().findCounter(COUNTER_NAME).getValue();  
    }

【讨论】：

感谢您的回答