【问题标题】:Type mismatch from mappers while using MultipleInputs class使用 MultipleInputs 类时映射器的类型不匹配
【发布时间】:2016-04-26 05:03:26
【问题描述】:

我已经定义了 2 个映射器来分别处理 2 个文件。

第一个映射器的输出- 第二个映射器的输出 -

在驱动程序代码中,我使用 MultipleInput 类的 addInputPath() 添加了两个映射器。

在运行 jar 时,我遇到 Type mismatch 错误。

16/04/24 18:40:28 INFO mapreduce.Job: Task Id : attempt_1461435780053_0008_m_000001_0, Status : FAILED
Error: java.io.IOException: Type mismatch in value from map: expected hadoop.StationObj, received org.apache.hadoop.io.Text
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1077)

下面是代码

public static class customerMapper extends Mapper<LongWritable,Text,IntWritable,StationObj>
    {
        IntWritable outkey=new IntWritable();
        StationObj outvalue=new StationObj();

        //2,Russia,Jhonson,10000

        public void map(LongWritable key,Text values,Context context) throws IOException, InterruptedException
        {
            String []cols=values.toString().split(",");
            outkey.set(Integer.parseInt(cols[0]));
            outvalue.setAmount(Integer.parseInt(cols[3]));
            outvalue.setCountry(cols[1]);
            outvalue.setProduct(cols[2]);

            context.write(outkey, outvalue);
        }
    }

    public static class countryMapper extends Mapper<LongWritable,Text,IntWritable,Text>
    {
        IntWritable outkey=new IntWritable();
        Text outvalue=new Text();
        public void map(LongWritable key,Text values,Context context) throws IOException, InterruptedException
        {
            String []cols=values.toString().split(",");

            outkey.set(Integer.parseInt(cols[0]));
            outvalue.set(cols[1]);
            context.write(outkey,outvalue);
        }
    }

    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Configuration conf=new Configuration();
        Job job=new Job(conf,"dsddd");
        job.setJarByClass(stationRedJoin.class);
        job.setMapOutputKeyClass(IntWritable.class);

        //job.setMaxMapAttempts(1);

        MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class, customerMapper.class);
        MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class, countryMapper.class);

        FileOutputFormat.setOutputPath(job, new Path(args[2]));

        System.exit(job.waitForCompletion(true)?1:0);

    }

}

【问题讨论】:

    标签: hadoop mapreduce


    【解决方案1】:

    最好在 Driver 类中为 mapper 和 reducer(如果有)传递所有 types,例如:

    //output format for mapper
           job.setMapOutputKeyClass(Text.class);
           job.setMapOutputValueClass(Text.class);  <---
    
    //output format for reducer (if)
          job.setOutputKeyClass(Text.class);
          job.setOutputValueClass(Text.class);
    
    //use MultipleInputs and specify different Record class and Input formats
          MultipleInputs.addInputPath(job, fPath, FirstInputFormat.class, MyFirstMap.class);
          MultipleInputs.addInputPath(job, sPath, SecondInputFormat.class, MySecondMap.class);
    

    【讨论】:

    • 我有 2 个映射器。以下是两者的输入和输出类型。 1st mapper-customerMapper extends Mapper 2nd Mapper-countryMapper extends Mapper 两者映射器的输出类型不同。
    猜你喜欢
    • 1970-01-01
    • 2019-05-02
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-08-10
    • 1970-01-01
    相关资源
    最近更新 更多