【发布时间】:2015-12-05 06:16:19
【问题描述】:
我有一些看起来像这样的代码(片段):
public List<String> search(String streetNumber, String streetDirection, String streetName) throws ParseException, IOException {
IndexReader ir = DirectoryReader.open(fsDirectory);
Dictionary d = new LuceneDictionary(ir, "text");
try(SpellChecker spellchecker = new SpellChecker(fsDirectory)) {
IndexWriterConfig indexWriterConfig = new IndexWriterConfig(analyzer);
spellchecker.indexDictionary(d, indexWriterConfig, true);
String text = streetNumber + " " + streetDirection + " " + streetName;
String[] suggestions = spellchecker.suggestSimilar(text, MAX_MATCHES, 0.00001F);
return Arrays.asList(suggestions);
}
}
我用这个来测试它:
package ctc.api.web.service.impl;
import java.io.IOException;
import java.util.Arrays;
import java.util.List;
import org.apache.lucene.queryparser.classic.ParseException;
import org.testng.annotations.Test;
public class LuceneIndexServiceImplTest {
@Test
public void f() throws ParseException, IOException {
LuceneIndexServiceImpl t = new LuceneIndexServiceImpl();
String[] texts = { "123 n main st", "234 s apple st", "345 w orange st" };
t.addToIndex(Arrays.asList(texts).stream());
List<String> r;
r = t.search("123", "n", "moin");
org.testng.Assert.assertEquals(r.toString(), "[123 n main st]");
r = t.search("234", "", "opple");
org.testng.Assert.assertEquals(r.toString(), "[234 s apple st]");
r = t.search("345", "", "oge ave");
org.testng.Assert.assertEquals(r.toString(), "[345 w orange st]");
r = t.search("", "", "geez");
org.testng.Assert.assertEquals(r.toString(), "[345 w orange st]");
}
}
不幸的是,我似乎无法让最后一个断言通过。 Lucene 返回空,因为匹配太差(只有字母“ge”匹配)。不幸的是,对于我的应用程序,ANY 匹配总比 NO 匹配好。
如何强制 Lucene 拼写检查通过编辑距离返回最近的字符串?
【问题讨论】: