【问题标题】:Hadoop/Mapreduce Reducer is not workingHadoop/Mapreduce 减速器不工作
【发布时间】:2014-05-21 11:27:33
【问题描述】:

我从 GitHub 下载了有关 K-MEANS 算法(在 hadoop 中)的内容。 但是,这只适用于映射器。 (因为输出文件名是“part-m-00000”) 我希望减少输出文件。

我的 HDFS 订购:./bin/hadoop jar Kmeans.jar 主输入输出

请......一些身体帮助我!!!!

这里是 Main.class

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


public class Main{
    /**
     * @param args
     * @throws IOException 
     * @throws ClassNotFoundException 
     * @throws InterruptedException 
     */

    static enum Counter{
        CONVERGED
    }

    public static final String CENTROIDS = "centroids";

    public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {

        int iteration = 1;
        long changes = 0;
        Path dataPath = new Path(args[0]);

        //read in the initial cluster centroids.
        Configuration centroidConf = new Configuration();
        Job centroidInputJob = new Job(centroidConf);
        centroidInputJob.setJobName("KMeans Centroid Input");
        centroidInputJob.setJarByClass(Main.class);

        Path centroidsPath = new Path("centroids_0");

        centroidInputJob.setMapperClass(KmeansCentroidInputMapper.class);

        // No Combiner, no Reducer.

        centroidInputJob.setMapOutputKeyClass(Text.class);
        centroidInputJob.setMapOutputValueClass(Text.class);
        centroidInputJob.setOutputKeyClass(Text.class);
        centroidInputJob.setOutputValueClass(Text.class);

        FileInputFormat.addInputPath(centroidInputJob,new Path(args[1]) );
        FileOutputFormat.setOutputPath(centroidInputJob, centroidsPath);
        centroidInputJob.setNumReduceTasks(0);

        if (!centroidInputJob.waitForCompletion(true)) {
            System.err.println("Centroid input job failed!");
            System.exit(1);
        }

        while(true){
            Configuration conf = new Configuration();
            Path nextIter = new Path(String.format("centroids_%s", iteration));
            Path prevIter = new Path(String.format("centroids_%s", iteration - 1));
            conf.set(Main.CENTROIDS, prevIter.toString());

            Job job = new Job(conf);
            job.setJobName("Kmeans " + iteration);
            job.setJarByClass(Main.class);

            job.setJobName("KMeans "+ iteration);


            //Set Mapper, Combiner, and Reducer
            job.setMapperClass(MapClass.class);
            job.setReducerClass(ReduceClass.class);
            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(Text.class);
            job.setNumReduceTasks(1);
            job.setCombinerClass(CombineClass.class);
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(Text.class);

            //Set input/output paths
            FileInputFormat.addInputPath(job, dataPath);
            FileOutputFormat.setOutputPath(job, nextIter);

            job.setNumReduceTasks(1);
            job.waitForCompletion(true);
            iteration++;
            changes = job.getCounters().findCounter(Main.Counter.CONVERGED).getValue();
            job.getCounters().findCounter(Main.Counter.CONVERGED).setValue(0);
            if(changes<=0){
                break;
            }       
        }   
    }

}

源码:https://github.com/yezhang1989/K-Means-Clustering-on-MapReduce

【问题讨论】:

  • 在你的主类中,你将reduce任务的数量设置为零。这就是为什么你的工作没有执行reduce tsk。

标签: java hadoop mapreduce k-means


【解决方案1】:

请指挥

job.setNumReduceTasks(1);

并检查它是否正常工作。

MapReduce Job 的默认 Reducer 为 1,因此无需将其设置为 1

job.setNumReduceTasks(0); Reducer 任务不会运行,输出文件取决于地图的数量(part-m-00000)。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2016-07-02
    • 2016-08-13
    • 1970-01-01
    • 2010-12-31
    • 2016-12-29
    • 2016-02-15
    • 2011-07-25
    • 1970-01-01
    相关资源
    最近更新 更多