【问题标题】:FileNotFoundException while adding file in cache - Hadoop - Mapreduce在缓存中添加文件时出现 FileNotFoundException - Hadoop - Mapreduce
【发布时间】:2014-12-21 00:55:15
【问题描述】:

注意:我已经在这里浏览过有类似问题的帖子,并尝试了那里建议的不同方法,但仍然无法解决问题。

我想将 HDFS 中的文件添加到映射器的缓存中,因此我将其添加到驱动程序中,如下所示:

//Driver program
public static void main(String[] args) throws Exception {

Job job = Job.getInstance(new Configuration(), "QuestionOne"); 
Configuration conf = job.getConfiguration();

// I am passing my file path(which is in HDFS) as an argument. Eg : /input/users.dat
job.addCacheFile(new URI(args[1])); 

job.setJarByClass(QuestionOne.class); 
job.setMapperClass(Map.class); 
job.setReducerClass(Reduce.class);

...
System.exit(job.waitForCompletion(true) ? 0 : 1); 
}

其次是地图类检索文件并将其用作:

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable>{


protected void setup(Context context) throws IOException, InterruptedException {

    ...

    URI[] files = context.getCacheFiles();

    for(URI p : files) {
        System.out.println(p.getPath().toString()); // prints "/input/users.dat"

        // Exception (FileNotFoundException) at this line
        BufferedReader br = new BufferedReader(new FileReader(new File(p.getPath().toString())));

        // Use br

        br.close();     
    }
}

protected void map(LongWritable key, Text value, Context context ) throws IOException, InterruptedException {           
        ...
        ...
}

protected void cleanup(Context context) throws IOException, InterruptedException {
        ...
        ...
}
}

但是当我运行下面给出的程序时,我得到了 FileNotFoundException :

14/10/25 03:00:29 WARN mapred.LocalJobRunner: job_local30078493_0001
java.lang.Exception: java.io.FileNotFoundException: /input/users.dat (No such file or directory)

    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.io.FileNotFoundException: /hw1_input/users.dat (No such file or directory)
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.<init>(FileInputStream.java:146)
    at java.io.FileReader.<init>(FileReader.java:72)
    at QuestionOne$Map.setup(QuestionOne.java:46)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
14/10/25 03:00:30 INFO mapreduce.Job: Job job_local30078493_0001 running in uber mode : false

请帮我解决这个问题。

【问题讨论】:

    标签: java file memory hadoop mapreduce


    【解决方案1】:

    你需要使用分布式文件系统,而不是本地的:

    FileSystem fs = FileSystem.get(context.getConfiguration());
    for (URI p : files) {
        Path path = new Path(p.toString());
        FSDataInputStream fsin = fs.open(path);
        DataInputStream in = new DataInputStream(fsin);
        BufferedReader br = new BufferedReader(new InputStreamReader(in));
    
        //Use br
    
        br.close();
        in.close();
        fsin.close();           
    }
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2017-06-16
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-09-05
      相关资源
      最近更新 更多