【发布时间】:2014-04-25 04:58:34
【问题描述】:
我想创建一个包含三个 Hadoop 作业的链,其中一个作业的输出作为第二个作业的输入,以此类推。我想在不使用 Oozie 的情况下执行此操作。
我编写了以下代码来实现它:-
public class TfIdf {
public static void main(String args[]) throws IOException, InterruptedException, ClassNotFoundException
{
TfIdf tfIdf = new TfIdf();
tfIdf.runWordCount();
tfIdf.runDocWordCount();
tfIdf.TFIDFComputation();
}
public void runWordCount() throws IOException, InterruptedException, ClassNotFoundException
{
Job job = new Job();
job.setJarByClass(TfIdf.class);
job.setJobName("Word Count calculation");
job.setMapperClass(WordFrequencyMapper.class);
job.setReducerClass(WordFrequencyReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.setInputPaths(job, new Path("input"));
FileOutputFormat.setOutputPath(job, new Path("ouput"));
job.waitForCompletion(true);
}
public void runDocWordCount() throws IOException, InterruptedException, ClassNotFoundException
{
Job job = new Job();
job.setJarByClass(TfIdf.class);
job.setJobName("Word Doc count calculation");
job.setMapperClass(WordCountDocMapper.class);
job.setReducerClass(WordCountDocReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.setInputPaths(job, new Path("output"));
FileOutputFormat.setOutputPath(job, new Path("ouput_job2"));
job.waitForCompletion(true);
}
public void TFIDFComputation() throws IOException, InterruptedException, ClassNotFoundException
{
Job job = new Job();
job.setJarByClass(TfIdf.class);
job.setJobName("TFIDF calculation");
job.setMapperClass(TFIDFMapper.class);
job.setReducerClass(TFIDFReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.setInputPaths(job, new Path("output_job2"));
FileOutputFormat.setOutputPath(job, new Path("ouput_job3"));
job.waitForCompletion(true);
}
}
但是我得到了错误:
Input path does not exist: hdfs://localhost.localdomain:8020/user/cloudera/output
谁能帮我解决这个问题?
【问题讨论】:
-
hadoop fs -ls /user/cloudera/ 显示什么?
-
[cloudera@localhost ~]$ hadoop fs -ls /user/cloudera 找到 4 项 drwx---------cloudera cloudera 0 2013-10-31 01:37 /user/cloudera/ .Trash drwx------ cloudera cloudera 0 2013-11-13 11:02 /user/cloudera/.staging drwxr-xr-x - cloudera cloudera 0 2013-11-07 19:20 /user/cloudera/输入 drwxr-xr-x - cloudera cloudera 0 2013-11-13 11:02 /user/cloudera/ouput
-
改用 hadoop fs -ls hdfs://localhost.localdomain:8020/user/cloudera/ 怎么样?
标签: java apache hadoop mapreduce