【发布时间】:2012-04-01 10:24:50
【问题描述】:
我在尝试围绕 Lucene 库时遇到了困难。这是我目前所拥有的:
public void shingleMe()
{
try
{
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);
FileReader reader = new FileReader("test.txt");
ShingleAnalyzerWrapper shingleAnalyzer = new ShingleAnalyzerWrapper(analyzer, 2);
shingleAnalyzer.setOutputUnigrams(false);
TokenStream stream = shingleAnalyzer.tokenStream("contents", reader);
CharTermAttribute charTermAttribute = stream.getAttribute(CharTermAttribute.class);
while (stream.incrementToken())
{
System.out.println(charTermAttribute.toString());
}
}
catch (FileNotFoundException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
catch (IOException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
它在 stream.incrementToken() 处失败。据我了解,ShingleAnalyzerWrapper 使用另一个 Analyzer 来创建 shingle 分析器对象。从那里,我将其转换为令牌流,然后使用属性过滤器对其进行解析。但是,它总是会导致此异常:
线程“main”中的异常 java.lang.AbstractMethodError: org.apache.lucene.analysis.TokenStream.incrementToken()Z
想法?提前致谢!
【问题讨论】:
标签: lucene stream tokenize n-gram