【问题标题】:How to create a chain of Hadoop job without using OOzie如何在不使用 OOzie 的情况下创建 Hadoop 作业链
【发布时间】:2014-04-25 04:58:34
【问题描述】:

我想创建一个包含三个 Hadoop 作业的链,其中一个作业的输出作为第二个作业的输入,以此类推。我想在不使用 Oozie 的情况下执行此操作。

我编写了以下代码来实现它:-

public class TfIdf {
    public static void main(String args[]) throws IOException, InterruptedException, ClassNotFoundException
    {
        TfIdf tfIdf = new TfIdf();
        tfIdf.runWordCount();
        tfIdf.runDocWordCount();
        tfIdf.TFIDFComputation();
    }

    public void runWordCount() throws IOException, InterruptedException, ClassNotFoundException
    {
        Job job = new Job();


        job.setJarByClass(TfIdf.class);
        job.setJobName("Word Count calculation");

        job.setMapperClass(WordFrequencyMapper.class);
        job.setReducerClass(WordFrequencyReducer.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        FileInputFormat.setInputPaths(job, new Path("input"));
        FileOutputFormat.setOutputPath(job, new Path("ouput"));

        job.waitForCompletion(true);
    }

    public void runDocWordCount() throws IOException, InterruptedException, ClassNotFoundException
    {
        Job job = new Job();

        job.setJarByClass(TfIdf.class);
        job.setJobName("Word Doc count calculation");

        job.setMapperClass(WordCountDocMapper.class);
        job.setReducerClass(WordCountDocReducer.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        FileInputFormat.setInputPaths(job, new Path("output"));
        FileOutputFormat.setOutputPath(job, new Path("ouput_job2"));

        job.waitForCompletion(true);
    }

    public void TFIDFComputation() throws IOException, InterruptedException, ClassNotFoundException
    {
        Job job = new Job();

        job.setJarByClass(TfIdf.class);
        job.setJobName("TFIDF calculation");

        job.setMapperClass(TFIDFMapper.class);
        job.setReducerClass(TFIDFReducer.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        FileInputFormat.setInputPaths(job, new Path("output_job2"));
        FileOutputFormat.setOutputPath(job, new Path("ouput_job3"));

        job.waitForCompletion(true);
    }
}

但是我得到了错误:

Input path does not exist: hdfs://localhost.localdomain:8020/user/cloudera/output

谁能帮我解决这个问题?

【问题讨论】:

  • hadoop fs -ls /user/cloudera/ 显示什么?
  • [cloudera@localhost ~]$ hadoop fs -ls /user/cloudera 找到 4 项 drwx---------cloudera cloudera 0 2013-10-31 01:37 /user/cloudera/ .Trash drwx------ cloudera cloudera 0 2013-11-13 11:02 /user/cloudera/.staging drwxr-xr-x - cloudera cloudera 0 2013-11-07 19:20 /user/cloudera/输入 drwxr-xr-x - cloudera cloudera 0 2013-11-13 11:02 /user/cloudera/ouput
  • 改用 hadoop fs -ls hdfs://localhost.localdomain:8020/user/cloudera/ 怎么样?

标签: java apache hadoop mapreduce


【解决方案1】:

这个答案来的有点晚,但是......这只是你的目录名称中的一个简单的错字。您已将第一份工作的输出写入 dir“输出”,而您的第二份工作正在“输出”中查找它。

【讨论】:

    猜你喜欢
    • 2012-08-07
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-05-23
    • 2017-07-17
    相关资源
    最近更新 更多