为 Hadoop MR 创建序列文件格式答案

【问题标题】：Creating Sequence File Format for Hadoop MR为 Hadoop MR 创建序列文件格式
【发布时间】：2025-12-14 09:05:01
【问题描述】：

我正在与Hadoop MapRedue 合作，并且有一个问题。目前，我的映射器的input KV type 是LongWritable, LongWritable type 和 output KV type 也是 LongWritable, LongWritable type。 InputFileFormat 是 SequenceFileInputFormat。基本上我想要做的是将一个 txt 文件更改为 SequenceFileFormat 以便我可以将它用于我的映射器。

我想做的是

输入文件是这样的

1\t2 (key = 1, value = 2)

2\t3 (key = 2, value = 3)

不断……

我查看了这个帖子How to convert .txt file to Hadoop's sequence file format，但相信TextInputFormat 只支持Key = LongWritable and Value = Text

有什么方法可以在KV = LongWritable, LongWritable中获取txt并制作序列文件？

【问题讨论】：

标签： hadoop mapreduce

【解决方案1】：

当然，基本上与我在您链接的另一个线程中所说的方式相同。但是你必须实现你自己的Mapper。

只是给你一个快速的划痕：

public class LongLongMapper extends
    Mapper<LongWritable, Text, LongWritable, LongWritable> {

  @Override
  protected void map(LongWritable key, Text value,
      Mapper<LongWritable, Text, LongWritable, LongWritable>.Context context)
      throws IOException, InterruptedException {

    // assuming that your line contains key and value separated by \t
    String[] split = value.toString().split("\t");

    context.write(new LongWritable(Long.valueOf(split[0])), new LongWritable(
        Long.valueOf(split[1])));

  }

  public static void main(String[] args) throws IOException,
      InterruptedException, ClassNotFoundException {

    Configuration conf = new Configuration();
    Job job = new Job(conf);
    job.setJobName("Convert Text");
    job.setJarByClass(LongLongMapper.class);

    job.setMapperClass(Mapper.class);
    job.setReducerClass(Reducer.class);

    // increase if you need sorting or a special number of files
    job.setNumReduceTasks(0);

    job.setOutputKeyClass(LongWritable.class);
    job.setOutputValueClass(LongWritable.class);

    job.setOutputFormatClass(SequenceFileOutputFormat.class);
    job.setInputFormatClass(TextInputFormat.class);

    FileInputFormat.addInputPath(job, new Path("/input"));
    FileOutputFormat.setOutputPath(job, new Path("/output"));

    // submit and wait for completion
    job.waitForCompletion(true);
  }
}

您的映射器函数中的每个值都将获得您输入的一行，因此我们只是通过您的分隔符（制表符）将其拆分并将其每个部分解析为长整数。

就是这样。

【讨论】：

谢谢你，从骨架中得到了很多想法，并且能够创建一个序列。文件编写器。
如果你有另一个例子，请发给我，这样我可以更好地理解它，电子邮件 id ashishwinoria@gmail.com，提前谢谢亲爱的
请告诉我减速器类输入输出格式是什么，我的意思是输入和输出的键和值