【发布时间】:2018-03-29 10:39:57
【问题描述】:
我有以下 Apache Lucene 7 应用程序:
StandardAnalyzer standardAnalyzer = new StandardAnalyzer();
Directory directory = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(standardAnalyzer);
IndexWriter writer = new IndexWriter(directory, config);
Document document = new Document();
document.add(new TextField("content", new FileReader("document.txt")));
writer.addDocument(document);
writer.close();
IndexReader reader = DirectoryReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);
Query fuzzyQuery = new FuzzyQuery(new Term("content", "Company"), 2);
TopDocs results = searcher.search(fuzzyQuery, 5);
System.out.println("Hits: " + results.totalHits);
System.out.println("Max score:" + results.getMaxScore())
当我使用它时:
new FuzzyQuery(new Term("content", "Company"), 2);
应用程序运行良好并返回以下结果:
Hits: 1
Max score:0.35161147
但是当我尝试使用多词查询进行搜索时,例如:
new FuzzyQuery(new Term("content", "Company name"), 2);
它返回以下结果:
Hits: 0
Max score:NaN
无论如何,Company name 这个短语存在于源 document.txt 文件中。
在这种情况下如何正确使用FuzzyQuery,以便能够对多词短语进行模糊搜索。
更新
根据提供的解决方案,我已根据以下文本信息对其进行了测试:
Company name: BlueCross BlueShield Customer Service
1-800-521-2227
of Texas Preauth-Medical 1-800-441-9188
Preauth-MH/CD 1-800-528-7264
Blue Card Access 1-800-810-2583
对于以下查询:
SpanQuery[] clauses = new SpanQuery[2];
clauses[0] = new SpanMultiTermQueryWrapper<FuzzyQuery>(new FuzzyQuery(new Term("content", "BlueCross"), 2));
clauses[1] = new SpanMultiTermQueryWrapper<FuzzyQuery>(new FuzzyQuery(new Term("content", "BlueShield"), 2));
SpanNearQuery query = new SpanNearQuery(clauses, 0, true);
搜索工作正常:
Hits: 1
Max score:0.5753642
但是当我尝试破坏搜索查询时(例如从BlueCross 到BlueCros)
SpanQuery[] clauses = new SpanQuery[2];
clauses[0] = new SpanMultiTermQueryWrapper<FuzzyQuery>(new FuzzyQuery(new Term("content", "BlueCros"), 2));
clauses[1] = new SpanMultiTermQueryWrapper<FuzzyQuery>(new FuzzyQuery(new Term("content", "BlueShield"), 2));
SpanNearQuery query = new SpanNearQuery(clauses, 0, true);
它停止工作并返回:
Hits: 0
Max score:NaN
【问题讨论】:
标签: lucene fuzzy-search