【问题标题】:How to retrieve the field which triggered a hit for a elasticsearch query如何检索触发弹性搜索查询命中的字段
【发布时间】:2017-08-15 09:39:17
【问题描述】:

运行一个 wagtail 站点 (1.11),使用 elasticsearch (5.5) 作为搜索后端并索引多个字段,例如:

search_fields = Page.search_fields + [
    index.SearchField('body'),
    index.SearchField('get_post_type_display'),
    index.SearchField('document_excerpt', boost=2),
    index.SearchField('get_dark_data_full_text'),
]

我想在我的搜索结果模板中指出搜索在哪个字段中出现“命中”(或者更好地显示命中的 sn-p,但这似乎是另一个问题)。

This question 似乎解决了我的问题,但我不知道如何将其集成到我的 wagtail 网站中。

任何提示如何获取此信息以及如何将其集成到 wagtail 搜索中?

【问题讨论】:

    标签: django elasticsearch wagtail


    【解决方案1】:

    ElasticSearch 有一个解释 API,它可以解释它如何在内部对具有特定 id 的特定记录的字段命中进行评分。

    这是文档:

    https://www.elastic.co/guide/en/elasticsearch/reference/current/search-explain.html

    它肯定会为您提供有关如何提升每个字段以及如何建立分数的答案。

    例如,如果您的 hits max_score 为 2.0588222,并且您想知道哪些字段对该分数的贡献,您可以使用 explain API。

    这是一个解释查询响应的示例,您可以看到字段 firstName 贡献了 1.2321436 到最高得分,而 lastName 贡献了 0.8266786:

    {
      "_index" : "customer_test",
      "_type" : "customer",
      "_id" : "597f2b3a79c404fafefcd46e",
      "matched" : true,
      "explanation" : {
        "value" : **2.0588222**,
        "description" : "sum of:",
        "details" : [ {
          "value" : 2.0588222,
          "description" : "sum of:",
          "details" : [ {
            "value" : **1.2321436**,
            "description" : "weight(firstName:merge in 23) [PerFieldSimilarity], result of:",
            "details" : [ {
              "value" : 1.2321436,
              "description" : "score(doc=23,freq=1.0 = termFreq=1.0\n), product of:",
              "details" : [ {
                "value" : 1.2321436,
                "description" : "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                "details" : [ {
                  "value" : 3.0,
                  "description" : "docFreq",
                  "details" : [ ]
                }, {
                  "value" : 11.0,
                  "description" : "docCount",
                  "details" : [ ]
                } ]
              }, {
                "value" : 1.0,
                "description" : "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
                "details" : [ {
                  "value" : 1.0,
                  "description" : "termFreq=1.0",
                  "details" : [ ]
                }, {
                  "value" : 1.2,
                  "description" : "parameter k1",
                  "details" : [ ]
                }, {
                  "value" : 0.75,
                  "description" : "parameter b",
                  "details" : [ ]
                }, {
                  "value" : 1.0,
                  "description" : "avgFieldLength",
                  "details" : [ ]
                }, {
                  "value" : 1.0,
                  "description" : "fieldLength",
                  "details" : [ ]
                } ]
              } ]
            } ]
          }, {
            "value" : 0.8266786,
            "description" : "weight(lastName:doe in 23) [PerFieldSimilarity], result of:",
            "details" : [ {
              "value" : 0.8266786,
              "description" : "score(doc=23,freq=1.0 = termFreq=1.0\n), product of:",
              "details" : [ {
                "value" : **0.8266786**,
                "description" : "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                "details" : [ {
                  "value" : 3.0,
                  "description" : "docFreq",
                  "details" : [ ]
                }, {
                  "value" : 7.0,
                  "description" : "docCount",
                  "details" : [ ]
                } ]
              }, {
                "value" : 1.0,
                "description" : "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
                "details" : [ {
                  "value" : 1.0,
                  "description" : "termFreq=1.0",
                  "details" : [ ]
                }, {
                  "value" : 1.2,
                  "description" : "parameter k1",
                  "details" : [ ]
                }, {
                  "value" : 0.75,
                  "description" : "parameter b",
                  "details" : [ ]
                }, {
                  "value" : 1.0,
                  "description" : "avgFieldLength",
                  "details" : [ ]
                }, {
                  "value" : 1.0,
                  "description" : "fieldLength",
                  "details" : [ ]
                } ]
              } ]
            } ]
          } ]
        }, {
          "value" : 0.0,
          "description" : "match on required clause, product of:",
          "details" : [ {
            "value" : 0.0,
            "description" : "# clause",
            "details" : [ ]
          }, {
            "value" : 1.0,
            "description" : "_type:customer, product of:",
            "details" : [ {
              "value" : 1.0,
              "description" : "boost",
              "details" : [ ]
            }, {
              "value" : 1.0,
              "description" : "queryNorm",
              "details" : [ ]
            } ]
          } ]
        } ]
      }
    }
    

    关于鹡鸰:我没有这方面的经验。但您绝对可以访问 REST API 并解析 Explain 查询的 JSON。

    【讨论】:

    • 这听起来很棒@gil.fernandes - 但我不知道如何使用内置搜索和elasticsearch作为后端的wagtail来使用此功能,如果有人可以,我会很高兴指向一个示例实现
    猜你喜欢
    • 2018-11-03
    • 1970-01-01
    • 2016-10-04
    • 1970-01-01
    • 2017-07-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多