【发布时间】:2014-11-18 08:49:36
【问题描述】:
documentation 在“创建委托”部分下建议的幼稚方法无法按预期工作,因为它会导致委托 Tokenizer 合同违规:
private static class TokenizerWrapper extends Tokenizer {
public TokenizerWrapper(Reader _input) {
super(_input);
delegate = new WhitespaceTokenizer(input);
}
@Override
public void reset() throws IOException {
logger.info("TokenizerWrapper.reset()");
super.reset();
delegate.setReader(input);
delegate.reset();
}
@Override
public final boolean incrementToken() throws IOException {
logger.info("TokenizerWrapper.incrementToken()");
return delegate.incrementToken();
}
private final WhitespaceTokenizer delegate;
}
给我以下日志:
14:30:12.885 [main] INFO test.GapTest - TokenizerWrapper.reset()
14:30:12.886 [main] INFO test.GapTest - TokenizerWrapper.incrementToken()
14:30:12.889 [main] INFO test.GapTest - TokenizerWrapper.incrementToken()
14:30:12.889 [main] INFO test.GapTest - TokenizerWrapper.incrementToken()
14:30:12.897 [main] INFO test.GapTest - TokenizerWrapper.reset()
Exception in thread "main" java.lang.IllegalStateException: TokenStream contract violation: close() call missing
at org.apache.lucene.analysis.Tokenizer.setReader(Tokenizer.java:90)
at test.GapTest$TestTokenizer.reset(GapTest.java:152)
at org.apache.lucene.analysis.TokenFilter.reset(TokenFilter.java:70)
at org.apache.lucene.analysis.TokenFilter.reset(TokenFilter.java:70)
at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:599)
at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:342)
at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:301)
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:241)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:454)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1511)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1246)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1231)
at test.GapTest.main(GapTest.java:67)
像这样覆盖 close() 方法:
@Override
public void close() throws IOException {
logger.info("TokenizerWrapper.close()");
super.close();
logger.info("TokenizerWrapper.delegate.close()");
tokenizer.close();
// tokenizer.setReader(input);
}
也无济于事,但出现不同的错误:
15:36:49.561 [main] INFO test.GapTest - setting field "text" to "some text"
15:36:49.569 [main] INFO test.GapTest - Adding created document to the index
15:36:49.605 [main] INFO test.GapTest - createComponents()
15:36:49.633 [main] INFO test.GapTest - TokenizerWrapper(_input)
15:36:49.638 [main] INFO test.GapTest - TokenizerWrapper.reset()
15:36:49.639 [main] INFO test.GapTest - TokenizerWrapper.incrementToken()
15:36:49.640 [main] INFO test.GapTest - TokenizerWrapper.incrementToken()
15:36:49.640 [main] INFO test.GapTest - TokenizerWrapper.incrementToken()
15:36:49.641 [main] INFO test.GapTest - TokenizerWrapper.close()
15:36:49.641 [main] INFO test.GapTest - TokenizerWrapper.delegate.close()
15:36:49.648 [main] INFO test.GapTest - setting field "text" to "some text 1"
15:36:49.648 [main] INFO test.GapTest - Adding created document to the index
15:36:49.648 [main] INFO test.GapTest - TokenizerWrapper.reset()
15:36:49.648 [main] INFO test.GapTest - TokenizerWrapper.incrementToken()
15:36:49.649 [main] INFO test.GapTest - TokenizerWrapper.close()
15:36:49.649 [main] INFO test.GapTest - TokenizerWrapper.delegate.close()
Exception in thread "main" java.lang.IllegalArgumentException: first position increment must be > 0 (got 0) for field 'address'
at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:617)
at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:342)
at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:301)
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:241)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:454)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1511)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1246)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1231)
at test.GapTest.main(GapTest.java:72)
也就是说,
- 它已成功处理第一个文档(“文本”字段中有“一些文本”),
- 然后已开始处理第二个文档(“some text 1”),
- [貌似]成功处理了第一个标记(单词“some”,我在调试器中检查过),
- 然后因内部状态不一致而中断(DefaultIndexingChain.PerField.invert() 中的
invertState.posIncrAttribute.getPositionIncrement(IndexableField field, boolean first)返回 0,而它的“正常”行为是返回 1)
当然,我可以通过进一步封装和变通方法来处理这个特定的错误,但是在实现这样一个看似简单的任务时,我的方向可能是错误的。请提出建议。
【问题讨论】:
标签: java lucene delegates tokenize