Apache Lucene：如何从文档中获取第一个匹配的子字符串答案

【问题标题】：Apache Lucene: How to get the first matching substring from a DocumentApache Lucene：如何从文档中获取第一个匹配的子字符串
【发布时间】：2010-10-20 14:28:40
【问题描述】：

我在网络和 stackoverflow 上找不到任何关于如何从 Lucene 文档中获取第一个匹配字符子序列的信息。

ATM 我正在使用这个逻辑从 Lucene 中检索结果：

        Document doc=searcher.doc(hit.doc);
        String text=doc.get("text");
        if (text.length() > 80){
            text=text.substring(0,80);
        }
        results.add(new SearchResult(doc.get("url"), doc.get("title"), text));

如您所见，这仅获取搜索文本的前 80 个字符，并将其与其他一些数据一起包装到 SearchResult 对象中。

是否有可能检索实际包含任何搜索词的文本的第一个甚至最高得分的子序列？

【问题讨论】：

标签： java lucene

【解决方案1】：

您需要Lucene Highlighter。 Here 和 here 你可以找到更多关于它的信息。

【讨论】：

另请注意，Lucene 2.x 和 Lucene 3.0 都有几个 Highlighter 实现。选择更适合您任务的那个。

【解决方案2】：

它被称为hit highlighter。这可能是another highlighter question的副本

【讨论】：