【问题标题】:Unable to add index files to solr from nutch无法从 nutch 将索引文件添加到 solr
【发布时间】:2023-03-05 17:23:01
【问题描述】:

我已经在 Windows 上使用 solr(4.4.0) 设置了 nutch(1.4),并爬取了教程中提到的 nutch 默认页面。但是,爬网成功后,我无法使用命令“bin/nutch solrindex http://xxx.xxx.xxx.xxx:8080/solr/ crawl/crawldb -linkdb crawl/linkdb crawl/segments/*”将页面添加到索引中 p>

以下是 hadoop 日志的摘录。任何帮助都深表感谢。

2013-09-13 14:50:24,137 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2013-09-13 14:50:24,137 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off
2013-09-13 14:50:24,137 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2013-09-13 14:50:24,215 INFO  solr.SolrMappingReader - source: content dest: content
2013-09-13 14:50:24,215 INFO  solr.SolrMappingReader - source: site dest: site
2013-09-13 14:50:24,215 INFO  solr.SolrMappingReader - source: title dest: title
2013-09-13 14:50:24,215 INFO  solr.SolrMappingReader - source: host dest: host
2013-09-13 14:50:24,215 INFO  solr.SolrMappingReader - source: segment dest: segment
2013-09-13 14:50:24,215 INFO  solr.SolrMappingReader - source: boost dest: boost
2013-09-13 14:50:24,215 INFO  solr.SolrMappingReader - source: digest dest: digest
2013-09-13 14:50:24,215 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp
2013-09-13 14:50:24,215 INFO  solr.SolrMappingReader - source: url dest: id
2013-09-13 14:50:24,215 INFO  solr.SolrMappingReader - source: url dest: url
2013-09-13 14:50:24,277 INFO  solr.SolrWriter - Adding 11 documents
2013-09-13 14:50:24,511 WARN  mapred.LocalJobRunner - job_local_0001
org.apache.solr.common.SolrException: Internal Server Error

Internal Server Error

request: http://xxx.xxx.xxx.xxx:8080/solr/update?wt=javabin&version=2
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
    at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
    at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
    at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:93)
    at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2013-09-13 14:50:25,229 ERROR solr.SolrIndexer - java.io.IOException: Job failed!

【问题讨论】:

  • 请也添加您的 solr 日志。

标签: solr lucene nutch


【解决方案1】:

有了这些信息,我想你忘记运行 Solr 或者你没有获得 nutch 的访问权限! 您可以在浏览器中访问http://xxx.xxx.xxx.xxx:8080/solr/ 吗?

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2023-03-27
    • 1970-01-01
    • 2021-03-02
    • 2022-06-28
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多