【问题标题】:Deserialize MapWritable within custom java class在自定义 java 类中反序列化 MapWritable
【发布时间】:2024-01-04 12:08:01
【问题描述】:

我目前正在尝试反序列化一个自定义对象,其中一个字段是 MapWritable,另一个是字符串。似乎序列化工作正常,但无法验证对象是否正在正确重新创建。他是我的字段和 write() readFields() 方法:

public class ExchangeDataSample implements DataSample {

    private String labelColumn;

    private MapWritable values = new MapWritable();

    ...other methods...

    @Override
    public void readFields(DataInput in) throws IOException {
        values.clear();
        values.readFields(in);
        labelColumn = in.readLine();
    }

    @Override
    public void write(DataOutput out) throws IOException {
        values.write(out);
        out.writeBytes(labelColumn);
    }
}

我的 MapReduce 作业中不断收到此异常:

java.lang.Exception: java.io.EOFException
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.io.EOFException
    at java.io.DataInputStream.readFully(DataInputStream.java:197)
    at java.io.DataInputStream.readUTF(DataInputStream.java:609)
    at java.io.DataInputStream.readUTF(DataInputStream.java:564)
    at org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.java:207)
    at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:167)
    at decisiontree.data.ExchangeDataSample.readFields(ExchangeDataSample.java:98)
    at org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:96)
    at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
    at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
    at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:146)
    at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121)
    at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:302)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170)
    at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1688)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1637)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1489)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

非常感谢您的帮助。谢谢。

【问题讨论】:

    标签: java serialization mapreduce deserialization writable


    【解决方案1】:

    您收到此异常是因为您阅读时没有检查文件末尾。尝试将 readFields 方法更改为:

     @Override
    public void readFields(DataInput in) throws IOException {
        values.clear();
        byte[] b = new byte[1024];
        //checks for end of file
        if(((DataInputStream) in).read(b)!=-1){ 
          values.readFields(in);
          labelColumn = in.readLine();
        }
    }
    

    【讨论】:

    • 感谢您的帮助,但我仍然在“value.readFields(in)”处遇到同样的错误。这是因为它已经在 if 语句中被读取了吗?然后又是-1。我应该只包装一个 try/catch 吗?另外,是否不需要显式重新设置 MapWritable 值?我对复杂结构的序列化/反序列化不是很熟悉。
    • 我包裹在一个 try/catch 中,但只是试图在 in.readLine 之后打印字符串“labelColumn”,显示一堆乱码数据,这些数据应该是 MapWritable 对象中的内容。如果我不能让这个工作,我可能不得不尝试 Java Json 或其他东西。
    • @AlexRamos 是的,应该是没有反序列化的内容。这里有些奇怪,但如果你想要一个快速的解决方案,我建议你切换到 JSON,这确实更容易。看看这个库:code.google.com/p/json-io
    • 我最终为此使用了杰克逊 JSON 库。我赞成您的回答,因为您的评论是导致我这样做的原因。谢谢!
    • 很高兴听到这个消息! ;)