【问题标题】:Independent map reduce jobs to executed one after other独立的map reduce作业一个接一个地执行
【发布时间】:2025-12-09 21:10:01
【问题描述】:

是否可以执行独立的map reduce作业(而不是链接reducer的输出

  1. 成为映射器的输入。
  2. 可以一个接一个地执行。

【问题讨论】:

  • 您在哪个环境中执行?在 Amazon EMR 配置中,您可以选择添加作业。
  • 嗨 @VanajaJayaraman 我在 apache hadoop 2.4.1 中运行
  • 我问了环境。这意味着,无论是本地 linux 机器还是云存储?
  • Linux 机器有 5 个节点集群,不是云存储
  • 如何执行 map/reduce 作业?

标签: hadoop mapreduce bigdata


【解决方案1】:

在你的驱动代码中调用两个方法runfirstjob,runsecondjob.just like this.this只是一个提示,根据你的需要做修改

public class ExerciseDriver {


static Configuration conf;

public static void main(String[] args) throws Exception {

    ExerciseDriver ED = new ExerciseDriver();
    conf = new Configuration();
    FileSystem fs = FileSystem.get(conf);

    if(args.length < 4) {
        System.out.println("Too few arguments. Arguments should be:  <hdfs input folder> <hdfs output folder> <N configurable Integer Value>");
        System.exit(0);
    }

    String pathin1stmr = args[0];
    String pathout1stmr = args[1];
    String pathin2ndmr = args[2];
    String pathout2ndmr = args[3];

    ED.runFirstJob(pathin1stmr, pathout1stmr);

    ED.runSecondJob(pathin2ndmr, pathout2ndmr);

}

public int runFirstJob(String pathin, String pathout)  

 throws Exception {

    Job job = new Job(conf);
    job.setJarByClass(ExerciseDriver.class);
    job.setMapperClass(ExerciseMapper1.class);
    job.setCombinerClass(ExerciseCombiner.class);
    job.setReducerClass(ExerciseReducer1.class);
    job.setInputFormatClass(ParagrapghInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class); 
    FileInputFormat.addInputPath(job, new Path(pathin));
    FileOutputFormat.setOutputPath(job, new Path(pathout));

   job.submit();  

   job.getMaxMapAttempts();

   /*
   JobContextImpl jc = new JobContextImpl();
   TaskReport[] maps = jobclient.getMapTaskReports(job.getJobID());

    */

    boolean success = job.waitForCompletion(true);
    return success ? 0 : -1;

}

  public int runSecondJob(String pathin, String pathout) throws Exception { 
    Job job = new Job(conf);
    job.setJarByClass(ExerciseDriver.class);
    job.setMapperClass(ExerciseMapper2.class);
    job.setReducerClass(ExerciseReducer2.class);
    job.setInputFormatClass(KeyValueTextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);    
    FileInputFormat.addInputPath(job,new Path(pathin));
    FileOutputFormat.setOutputPath(job, new Path(pathout));
    boolean success = job.waitForCompletion(true);
    return success ? 0 : -1;
}

 }

【讨论】:

    【解决方案2】:

    如果您想一个接一个地执行,那么您可以按照以下链接链接您的作业:

    http://unmeshasreeveni.blogspot.in/2014/04/chaining-jobs-in-hadoop-mapreduce.html

    【讨论】:

    • 在链接中你知道reducer的输出变成了第二个mapper的输入。我不希望第二个映射器使用单独的输入独立执行
    • 尝试删除行 job.waitForCompletion(true);来自上述链接中给出的代码。
    • 您好上面来自@sravan 的代码使程序运行两个独立的地图减少作业。无论如何感谢讨论@Vanaja Jayaraman
    • 我认为我所说的链接也是如此......如果可以请检查它......
    【解决方案3】:
    You can go with Parallel job running. Sample code is given below
    
    Configuration conf = new Configuration();
    Path Job1InputDir = new Path(args[0]);
    Path Job2InputDir = new Path(args[1]);
    Path Job1OutputDir = new Path(args[2]);
    Path Job2OutputDir = new Path(args[3]);
    Job Job1= submitJob(conf, Job1InputDir , Job1OutputDir );
    Job Job2= submitJob(conf, Job2InputDir , Job2OutputDir );
    // While both jobs are not finished, sleep
    while (!Job1.isComplete() || !Job2.isComplete()) {
    Thread.sleep(5000);
    }
    if (Job1.isSuccessful()) {
    System.out.println(" job1 completed successfully!");
    } else {
    System.out.println(" job1 failed!");
    }
    if (Job2.isSuccessful()) {
    System.out.println("Job2 completed successfully!");
    } else {
    System.out.println("Job2 failed!");
    }
    System.exit(Job1.isSuccessful() &&
    Job2.isSuccessful() ? 0 : 1);
    }
    

    【讨论】:

      最近更新 更多