【发布时间】:2025-11-23 04:50:01
【问题描述】:
您好,我已经在 Ubuntu 上安装了 solr 和 nutch。我能够在某些情况下进行爬网和索引,但并非一直如此。我反复收到此路径错误,无法在线找到解决方案。通常,我会删除有错误的目录并重新运行,它会运行良好。但我不想再这样做了。是什么导致了错误?谢谢。
LinkDb: adding segment: file:/home/nutch/nutch/runtime/local/crawl/segments/20111027231916
LinkDb: adding segment: file:/home/nutch/nutch/runtime/local/crawl/segments/20111027232907
LinkDb: adding segment: file:/home/nutch/nutch/runtime/local/crawl/segments/20111027233840
LinkDb: adding segment: file:/home/nutch/nutch/runtime/local/crawl/segments/20111027224701
LinkDb: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/nutch/nutch/runtime/local/crawl/segments/20111027231916/parse_data
Input path does not exist: file:/home/nutch/nutch/runtime/local/crawl/segments/20111027232907/parse_data
Input path does not exist: file:/home/nutch/nutch/runtime/local/crawl/segments/20111027233840/parse_data
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190)
at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:44)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:175)
at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:290)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:255)
【问题讨论】:
标签: solr nutch web-crawler