对来自 Lucene 索引的结果进行分类答案

【问题标题】：Categorize results coming from a Lucene index对来自 Lucene 索引的结果进行分类
【发布时间】：2017-10-02 07:19:45
【问题描述】：

我有一个 Lucene 索引，在 Hibernate Search 注释的帮助下通过 Hibernate 生成，有 3 个字段（只是为了简化一点）描述一篇文章：

id, title, brand

内容示例：

id, title, brand 1, "Long skirt", "Sweet and Gabbana" 2, "Sweet neck vest", "Armani" 3, "Sweet feeling shirt", "Armani"

注意“Sweet and Gabbana”、“甜美领口背心”和“Sweet feel shirt”如何共享“sweet”这个词。

我想做一个 Lucene 查询，如果我搜索关键字“sweet”，我会得到 2 个不同的类别，一个是标题，另一个是品牌。例如：

Titles -> “甜颈背心”、“甜美感衬衫”
品牌 -> “Sweet and Gabbana”

换句话说，我想向用户展示系统在这两个不同的类别中找到结果。

当我运行查询（标题和品牌之间的一种 OR）时，我得到所有三个条目（在 Lucene 中，id 为 1、2 和 3 的文档），它们只包含一个属性或另一个属性，但是我如何对它们进行分类？

@PersistenceContext
private EntityManager em;

...

@Override
public List<ArticleByIndexModel> retrieveArticlesSearchQueryResult(final String searchString,
        final String languageIso639) {

    final FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(em);
    final org.apache.lucene.search.Query luceneQuery = buildUpArticlesSearchLuceneQuery(searchString,
            languageIso639, fullTextEntityManager);

    final String titleFieldName = ArticleTranslationFieldPrefixes.TITLE + languageIso639;
    final String brandNameFieldName = BrandTaxonomy.BrandTaxonomyNameFieldName.NAME;

    final FullTextQuery fullTextQuery = fullTextEntityManager.createFullTextQuery(luceneQuery);
    fullTextQuery.setMaxResults(50);
    fullTextQuery.setProjection(Article_.articleID.getName(), titleFieldName, brandNameFieldName,
            Article_.brandSku.getName(), FullTextQuery.DOCUMENT_ID, FullTextQuery.EXPLANATION, FullTextQuery.THIS);

    @SuppressWarnings("unchecked")
    final List<Object[]> list = (List<Object[]>) fullTextQuery.getResultList();

    final List<ArticleByIndexModel> resultList = list.stream()
            .map(x -> new ArticleByIndexModel((Integer) x[0], (String) x[1])).collect(Collectors.toList());
    return resultList;
}

private org.apache.lucene.search.Query buildUpArticlesSearchLuceneQuery(final String searchString,
        final String languageIso639, final FullTextEntityManager fullTextEntityManager) {

    final String brandSkuName = Article_.brandSku.getName();

    final String analyzerPartName = ArticleTranslationDiscriminator.getAnalyzerPartNameByLanguage(languageIso639);
    final String titleFieldName = ArticleTranslationFieldPrefixes.TITLE + languageIso639;
    final String titleEdgeNGramFieldName = ArticleTranslationFieldPrefixes.TITLE_EDGE_N_GRAM + languageIso639;
    final String titleNGramFieldName = ArticleTranslationFieldPrefixes.TITLE_N_GRAM + languageIso639;

    final String brandNameEdgeNGramFieldName = BrandTaxonomy.BrandTaxonomyNameFieldName.NAME_EDGE_N_GRAM;
    final String brandNameNGramFieldName = BrandTaxonomy.BrandTaxonomyNameFieldName.NAME_N_GRAM;

    final SearchFactory searchFactory = fullTextEntityManager.getSearchFactory();
    final QueryBuilder qb = searchFactory.buildQueryBuilder().forEntity(Article.class)
            .overridesForField(titleFieldName, ArticleTranslationFieldPrefixes.TITLE + analyzerPartName)
            .overridesForField(titleEdgeNGramFieldName,
                    ArticleTranslationFieldPrefixes.TITLE_EDGE_N_GRAM + analyzerPartName)
            .overridesForField(titleNGramFieldName, ArticleTranslationFieldPrefixes.TITLE_N_GRAM + analyzerPartName)
            .get();

    final org.apache.lucene.search.Query luceneQuery =
            /**/
            qb.bool()
                    /**/
                    .should(qb.phrase().withSlop(2).onField(titleNGramFieldName).andField(titleEdgeNGramFieldName)
                            .boostedTo(5).sentence(searchString.toLowerCase()).createQuery())
                    /**/
                    .should(qb.phrase().withSlop(2).onField(brandNameNGramFieldName)
                            .andField(brandNameEdgeNGramFieldName).boostedTo(5).sentence(searchString.toLowerCase())
                            .createQuery())
                    /**/
                    .should(qb.keyword().onField(brandSkuName).matching(searchString.toLowerCase()).createQuery())
                    /**/
                    .createQuery();

    return luceneQuery;
}

我认为在进行 2 个不同的查询然后合并结果方面没有任何解决方案。

我读到了 Facets，但我认为它们不适合这种情况。

你有什么想法吗？

谢谢！！！

【问题讨论】：

为什么不只进行一次查询，然后遍历结果并自己创建类别？我认为即使 lucene 有这种东西（某种分组），它也会这样做......
无关说明：您可能不想在FullTextQuery.DOCUMENT_ID 上进行投影，因为这是一个内部 ID，与您的实体 ID 不同。请改用FullTextQuery.ID。
@Yossy，我不能按照你的建议做，因为我在问题中报告的查询返回了所有文档，然后无法按类别对结果进行分组（除非我使用某种结果的模式匹配）。约安。是的，我知道。你是对的。

标签： java hibernate lucene hibernate-search

【解决方案1】：

我假设您需要将结果作为单个列表显示给用户，其中包含每个项目的一些描述（由于标题匹配/由于品牌匹配/两者都匹配）。

我认为在 Hibernate Search 中没有一个功能可以让您完全做到这一点。我想有一些方法可以使用低级 Lucene API（收集器）来实现，但这会涉及一些黑魔法，我认为我们无法将其插入 Hibernate Search。

所以让我们走一条更简单的路：自己动手。

我个人会简单地运行多个查询：

第一次就像您在示例中所做的那样
第二次投影 ID (.setProjection(ProjectionContants.ID)) 并仅使用两个子句：一个强制匹配的 ID 与第一个查询的结果之一相同（基本上是 must(should(id=<firstID>), should(id=<secondID>), ... )，一个强制搜索标题上的字符串匹配（基本上是must(title=<searchString>)
第三次与第二次类似，但用品牌而不是标题

然后我会使用第二个和第三个查询的结果来确定给定的结果是因为标题还是因为品牌匹配。

当然，这只有在搜索字符串只期望完全匹配标题或品牌（或两者）时才有效，如果搜索字符串的某些部分与标题匹配，而其他部分与品牌匹配则无效。但如果这就是你想要的，那么你当前的查询无论如何都是错误的......

【讨论】：

@Yoann。您已经完全理解了我的问题，我完全同意您的想法，即通过多个查询来获得我在索引中寻找的内容。谢谢！！