【问题标题】:My Lucene search doesn't return results我的 Lucene 搜索不返回结果
【发布时间】:2013-04-25 14:01:12
【问题描述】:

我正在学习 Lucene,这是我的第一堂测试课。我正在尝试实现内存搜索并从示例中借用一些代码。但搜索无法返回任何命中。你能帮我吗?谢谢。

    package my.test;
    import java.io.IOException;

    import org.apache.lucene.analysis.standard.StandardAnalyzer;
    import org.apache.lucene.analysis.util.CharArraySet;
    import org.apache.lucene.document.Document;
    import org.apache.lucene.document.Field;
    import org.apache.lucene.document.StringField;
    import org.apache.lucene.index.IndexWriter;
    import org.apache.lucene.index.IndexWriterConfig;
    import org.apache.lucene.index.IndexWriterConfig.OpenMode;
    import org.apache.lucene.index.Term;
    import org.apache.lucene.search.BooleanClause;
    import org.apache.lucene.search.BooleanQuery;
    import org.apache.lucene.search.IndexSearcher;
    import org.apache.lucene.search.PrefixQuery;
    import org.apache.lucene.search.ScoreDoc;
    import org.apache.lucene.search.SearcherManager;
    import org.apache.lucene.store.RAMDirectory;
    import org.apache.lucene.util.Version;

    public class TestInMemorySearch {
      public static void main(String[] args) {
        // Construct a RAMDirectory to hold the in-memory representation of the index.
        RAMDirectory idx = new RAMDirectory();

    try {
      // Make an writer to create the index
      IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_42, new StandardAnalyzer(Version.LUCENE_42, CharArraySet.EMPTY_SET));
      iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
      IndexWriter writer = new IndexWriter(idx, iwc);

      // Add some Document objects containing quotes
      writer.addDocument(createDocument("Theodore Roosevelt man", "It behooves every man to remember that the work of the "
          + "critic, is of altogether secondary importance, and that, " + "in the end, progress is accomplished by the man who does " + "things."));
      writer.addDocument(createDocument("Friedrich Hayek", "The case for individual freedom rests largely on the "
          + "recognition of the inevitable and universal ignorance " + "of all of us concerning a great many of the factors on "
          + "which the achievements of our ends and welfare depend."));
      writer.addDocument(createDocument("Ayn Rand", "There is nothing to take a man's freedom away from "
          + "him, save other men. To be free, a man must be free " + "of his brothers."));
      writer.addDocument(createDocument("Mohandas Gandhi", "Freedom is not worth having if it does not connote " + "freedom to err."));

      // Optimize and close the writer to finish building the index
      writer.close();
      // Build an IndexSearcher using the in-memory index
      SearcherManager mgr = new SearcherManager(idx, null);

      try {
        Document[] hits = search(mgr, "man", 100);
        for (Document doc : hits) {
          String title    = doc.get("title");
          String content  = doc.get("content");
          System.out.println("Found match:[Title]" + title + ", [Content]" + content);
        }

      } catch (IOException e) {
        e.printStackTrace();
      }

    } catch (IOException ioe) {
      // In this example we aren't really doing an I/O, so this
      // exception should never actually be thrown.
      ioe.printStackTrace();
    }
  }

  /**
   * Make a Document object with an un-indexed title field and an indexed
   * content field.
   */
  private static Document createDocument(String title, String content) {
    Document doc = new Document();
    doc.add(new StringField("title", title, Field.Store.YES));
    doc.add(new StringField("content", content, Field.Store.YES));

    return doc;
  }

  private static Document[] search(SearcherManager searchManager, String searchString, int maxResults) throws IOException {
    IndexSearcher searcher = null;
    try {
      // Build the query.
      String[] tokens = searchString.split("\\s+");
      BooleanQuery query = new BooleanQuery();
      for (String token : tokens) {
        query.add(new PrefixQuery(new Term("title", token)), BooleanClause.Occur.MUST);
        query.add(new PrefixQuery(new Term("content", token)), BooleanClause.Occur.MUST);
      }

      searcher = searchManager.acquire();
      ScoreDoc[] scoreDocs = searcher.search(query, maxResults).scoreDocs;
      Document[] documents = new Document[scoreDocs.length];
      for (int i = 0; i < scoreDocs.length; i++) {
        documents[i] = searcher.doc(scoreDocs[i].doc);
      }
      return documents;
    } finally {
      if (searcher != null) {
        searchManager.release(searcher);
      }
    }
  }    
}

【问题讨论】:

  • 您应该将代码简化为尽可能简单的内容,然后再进行构建。使用单个字段,使用普通的IndexSearcher 而不是SearcherManagar。形成一个简单的TermQuery,而不是PrefixQuerys 的BooleanQuery
  • 谢谢,我已经完成了这些简单的测试。只是想知道为什么这个不起作用。
  • 如果你想要那个答案,那么你应该指出仍然有效的最复杂的工作,而确切的改变是停止这样做。
  • 是的,我在尝试不同的查询时遇到了不同的问题。仍然感谢您的帮助。

标签: java lucene


【解决方案1】:

StringField 似乎是一个显而易见的选择,但这不是您想在这里使用的。你想要TextFieldStringField 将字段表示为单个标记,本质上是关键字或标识符。 TextField 分析和标记字段以进行全文搜索。

在您的 search 方法中修复它就像更改一样简单:

doc.add(new StringField("title", title, Field.Store.YES));
doc.add(new StringField("content", content, Field.Store.YES));

doc.add(new TextField("title", title, Field.Store.YES));
doc.add(new TextField("content", content, Field.Store.YES));

【讨论】:

  • 谢谢,现在可以使用了。我试图使用所有东西,所以这搞砸了。
  • 还有一个问题,我可以通过 'Document[] hits = search(mgr, "theodore", 100);' 进行搜索现在,但我无法通过 'Document[] hits = search(mgr, "Theodore", 100);' 进行搜索。使用相同的上述代码。你知道为什么要区分大小写吗?我的意思是我认为应该是相反的,我应该可以用“T”而不是“t”来搜索,对吧?
  • 在索引中,您的术语正在经历WhitespaceAnalyzer(以及其他转换),因此索引术语都是小写的。当使用QueryParser 时,它通常会应用相同的转换,因此搜索将不区分大小写。但是,在手动构建 TermQueries 时,您应该首先确保输入的内容小写。
  • 是的,我找到了。在 StandardAnalyzer 中,它们有: tok = new LowerCaseFilter(matchVersion, tok);如果我理解正确,它会将所有内容转换为小写。 :)