【发布时间】:2016-10-18 14:59:48
【问题描述】:
我正在运行一个 mapreduce 作业,它读取输入并使用多个 reduce 对其进行排序。 我能够将输出排序为 5 个减速器。但是,输出仅写入 1 个文件,并且有 4 个空文件。 我正在使用输入采样器和 totalorderpartitioner 进行全局排序。
我的驱动如下:
int numReduceTasks = 5;
Configuration conf = new Configuration();
Job job = new Job(conf, "DictionarySorter");
job.setJarByClass(SampleEMR.class);
job.setMapperClass(SortMapper.class);
job.setReducerClass(SortReducer.class);
job.setPartitionerClass(TotalOrderPartitioner.class);
job.setNumReduceTasks(numReduceTasks);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);
FileInputFormat.setInputPaths(job, input);
FileOutputFormat.setOutputPath(job, new Path(output
+ ".dictionary.sorted." + getCurrentDateTime()));
job.setPartitionerClass(TotalOrderPartitioner.class);
Path inputDir = new Path("/others/partitions");
Path partitionFile = new Path(inputDir, "partitioning");
TotalOrderPartitioner.setPartitionFile(job.getConfiguration(),
partitionFile);
double pcnt = 1.0;
int numSamples = numReduceTasks;
int maxSplits = numReduceTasks - 1;
if (0 >= maxSplits)
maxSplits = Integer.MAX_VALUE;
InputSampler.Sampler<LongWritable, Text> sampler = new InputSampler.RandomSampler<LongWritable, Text>(pcnt,
numSamples, maxSplits);
InputSampler.writePartitionFile(job, sampler);
job.waitForCompletion(true);
【问题讨论】:
标签: hadoop totalorderpartitioner