【问题标题】:Give weight to fields elasticsearch赋予字段弹性搜索权重
【发布时间】:2020-10-03 19:40:39
【问题描述】:

我有一个包含产品的弹性搜索索引,我正在尝试根据文本字段创建搜索列表产品。

数据集的排序示例

{"name": "foo", "count": 10} {"name": "bar", "count": 5} {"name": "foo bar"} {"name": "foo baz", "count": 20}

一开始,我是请求的。

GET /product
/_search
{
  "query": {
    "match": {"name": "foo"}
  }
}

效果很好,但现在我想增加一些产品的重量(count 字段)

我正在使用这个查询

GET /product/_search
{
  "query": {
    "function_score": {
      "query": {
        "match": {"name": "foo bar"}
      },
      "field_value_factor": {
        "field": "count",
        "missing": 0
      }
    }
  }
}

但是这个查询首先我有foo,然后是bar,然后是foo bar,似乎名称匹配不如计数重要,我想有foo bar,然后是foo和@987654333 @

但是在寻找foo 我想要foo bazfoofoo bar

【问题讨论】:

  • 我认为避免这种行为的最简单方法是提升查询中的名称字段。您还可以使用 "factor": 1.0, "modifier": "sqrt" 调整 field_value_factor 以降低他的重要性

标签: elasticsearch


【解决方案1】:

但是寻找 foo 我想要 foo baz、foo 和 foo bar

添加一个包含索引数据、搜索查询和搜索结果的工作示例

详情请参考function score query

索引数据:

{"name": "foo", "count": 10} 
{"name": "bar", "count": 5} 
{"name": "foo bar"} 
{"name": "foo baz", "count": 20}

搜索查询:

但是寻找 foo 我想要 foo baz、foo 和 foo bar

{
    "query": {
        "function_score": {
            "query": {
                "bool": {
                    "should": [
                        {
                            "match": {
                                "name": {
                                    "query": "foo"
                                }
                            }
                        }
                    ]
                }
            },
            "functions": [
                {
                    "field_value_factor": {
                        "field": "count",
                        "factor": 1.0,
                        "missing": 0
                    }
                }
            ],
            "boost_mode": "multiply"
        }
    }
}

搜索结果:

"hits": [
      {
        "_index": "stof_64169215",
        "_type": "_doc",
        "_id": "4",
        "_score": 6.2774796,
        "_source": {
          "name": "foo baz",
          "count": 20
        }
      },
      {
        "_index": "stof_64169215",
        "_type": "_doc",
        "_id": "1",
        "_score": 4.1299205,
        "_source": {
          "name": "foo",
          "count": 10
        }
      },
      {
        "_index": "stof_64169215",
        "_type": "_doc",
        "_id": "3",
        "_score": 0.0,
        "_source": {
          "name": "foo bar"
        }
      }
    ]

更新 1:

我想要 foo bar 然后 foo 和 bar

搜索查询:

{
    "query": {
        "function_score": {
            "query": {
                "bool": {
                    "should": [
                        {
                            "match": {
                                "name": {
                                    "query": "foo bar"
                                }
                            }
                        }
                    ]
                }
            },
            "functions": [
                {
                    "field_value_factor": {
                        "field": "count",
                        "factor": 1.0,
                        "missing": 0,
                        "modifier": "sqrt"
                    }
                }
            ],
            "boost_mode": "sum"
        }
    }
}

解释 API 结果:

要了解上述搜索查询,您需要了解如何计算查询的分数。

  1. 搜索是针对"name": "foo bar" 进行的,理想情况下应该返回foo bar,然后是foo,然后是bar。通过对foo bar 的正常匹配查询(并且没有功能分数查询),您将获得结果。
  2. 现在,根据您的用例,您希望在 count 字段上添加权重,您使用了 Function score query,这允许您修改查询检索到的文档的分数。
  3. 此外,可以组合多个功能。 function_score 查询提供了几种类型的评分函数。 field_value_factor 函数允许您使用文档中的字段来影响分数。
  4. 在 field_value_factor 中,有几个选项:

factor - 与字段值相乘的可选因子,默认为 1

修饰符 - 应用于字段值的修饰符
missing - 如果文档没有该字段,则使用该值。

生成以下评分公式:

sqrt(1.0 * doc['count'].value)

现在,对于包含foo bar 的文档,没有count 字段,因此将使用缺失值(在查询中定义,即9)。得分为sqrt(1.0 * 9) = 3.0

如果你取任何小于 9 的缺失值,那么结果的顺序将会改变。因为count 字段的分数会有所不同(当您将缺失值指定为0 时,foo bar 仅根据match 查询获得分数,并且不会从 field_value_factor 添加分数)。最终分数是根据match查询+field_value_factor(在count字段上)计算的。所以foo bar的总分会低于其他文档。

例如:对于foo bar,最终得分计算为0.78038335+3.0=3.7803833。请仔细阅读以下结果,以详细了解如何计算得分。

请通过本博客了解how scoring works in elasticsearch

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": 3.7803833,
    "hits": [
      {
        "_shard": "[stof_64169215][0]",
        "_node": "fVeabsK0Q1GnCZ_8oROXjA",
        "_index": "stof_64169215",
        "_type": "_doc",
        "_id": "3",
        "_score": 3.7803833,
        "_source": {
          "name": "foo bar"
        },
        "_explanation": {
          "value": 3.7803833,
          "description": "sum of",
          "details": [
            {
              "value": 0.78038335,
              "description": "sum of:",
              "details": [
                {
                  "value": 0.39019167,
                  "description": "weight(name:foo in 0) [PerFieldSimilarity], result of:",
                  "details": [
                    {
                      "value": 0.39019167,
                      "description": "score(freq=1.0), computed as boost * idf * tf from:",
                      "details": [
                        {
                          "value": 2.2,
                          "description": "boost",
                          "details": []
                        },
                        {
                          "value": 0.47000363,
                          "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                          "details": [
                            {
                              "value": 2,
                              "description": "n, number of documents containing term",
                              "details": []
                            },
                            {
                              "value": 3,
                              "description": "N, total number of documents with field",
                              "details": []
                            }
                          ]
                        },
                        {
                          "value": 0.37735844,
                          "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                          "details": [
                            {
                              "value": 1.0,
                              "description": "freq, occurrences of term within document",
                              "details": []
                            },
                            {
                              "value": 1.2,
                              "description": "k1, term saturation parameter",
                              "details": []
                            },
                            {
                              "value": 0.75,
                              "description": "b, length normalization parameter",
                              "details": []
                            },
                            {
                              "value": 2.0,
                              "description": "dl, length of field",
                              "details": []
                            },
                            {
                              "value": 1.3333334,
                              "description": "avgdl, average length of field",
                              "details": []
                            }
                          ]
                        }
                      ]
                    }
                  ]
                },
                {
                  "value": 0.39019167,
                  "description": "weight(name:bar in 0) [PerFieldSimilarity], result of:",
                  "details": [
                    {
                      "value": 0.39019167,
                      "description": "score(freq=1.0), computed as boost * idf * tf from:",
                      "details": [
                        {
                          "value": 2.2,
                          "description": "boost",
                          "details": []
                        },
                        {
                          "value": 0.47000363,
                          "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                          "details": [
                            {
                              "value": 2,
                              "description": "n, number of documents containing term",
                              "details": []
                            },
                            {
                              "value": 3,
                              "description": "N, total number of documents with field",
                              "details": []
                            }
                          ]
                        },
                        {
                          "value": 0.37735844,
                          "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                          "details": [
                            {
                              "value": 1.0,
                              "description": "freq, occurrences of term within document",
                              "details": []
                            },
                            {
                              "value": 1.2,
                              "description": "k1, term saturation parameter",
                              "details": []
                            },
                            {
                              "value": 0.75,
                              "description": "b, length normalization parameter",
                              "details": []
                            },
                            {
                              "value": 2.0,
                              "description": "dl, length of field",
                              "details": []
                            },
                            {
                              "value": 1.3333334,
                              "description": "avgdl, average length of field",
                              "details": []
                            }
                          ]
                        }
                      ]
                    }
                  ]
                }
              ]
            },
            {
              "value": 3.0,
              "description": "min of:",
              "details": [
                {
                  "value": 3.0,
                  "description": "field value function: sqrt(doc['count'].value?:9.0 * factor=1.0)",
                  "details": []
                },
                {
                  "value": 3.4028235E38,
                  "description": "maxBoost",
                  "details": []
                }
              ]
            }
          ]
        }
      },
      {
        "_shard": "[stof_64169215][0]",
        "_node": "fVeabsK0Q1GnCZ_8oROXjA",
        "_index": "stof_64169215",
        "_type": "_doc",
        "_id": "1",
        "_score": 3.685826,
        "_source": {
          "name": "foo",
          "count": 10
        },
        "_explanation": {
          "value": 3.685826,
          "description": "sum of",
          "details": [
            {
              "value": 0.52354836,
              "description": "sum of:",
              "details": [
                {
                  "value": 0.52354836,
                  "description": "weight(name:foo in 0) [PerFieldSimilarity], result of:",
                  "details": [
                    {
                      "value": 0.52354836,
                      "description": "score(freq=1.0), computed as boost * idf * tf from:",
                      "details": [
                        {
                          "value": 2.2,
                          "description": "boost",
                          "details": []
                        },
                        {
                          "value": 0.47000363,
                          "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                          "details": [
                            {
                              "value": 2,
                              "description": "n, number of documents containing term",
                              "details": []
                            },
                            {
                              "value": 3,
                              "description": "N, total number of documents with field",
                              "details": []
                            }
                          ]
                        },
                        {
                          "value": 0.50632906,
                          "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                          "details": [
                            {
                              "value": 1.0,
                              "description": "freq, occurrences of term within document",
                              "details": []
                            },
                            {
                              "value": 1.2,
                              "description": "k1, term saturation parameter",
                              "details": []
                            },
                            {
                              "value": 0.75,
                              "description": "b, length normalization parameter",
                              "details": []
                            },
                            {
                              "value": 1.0,
                              "description": "dl, length of field",
                              "details": []
                            },
                            {
                              "value": 1.3333334,
                              "description": "avgdl, average length of field",
                              "details": []
                            }
                          ]
                        }
                      ]
                    }
                  ]
                }
              ]
            },
            {
              "value": 3.1622777,
              "description": "min of:",
              "details": [
                {
                  "value": 3.1622777,
                  "description": "field value function: sqrt(doc['count'].value?:9.0 * factor=1.0)",
                  "details": []
                },
                {
                  "value": 3.4028235E38,
                  "description": "maxBoost",
                  "details": []
                }
              ]
            }
          ]
        }
      },
      {
        "_shard": "[stof_64169215][0]",
        "_node": "fVeabsK0Q1GnCZ_8oROXjA",
        "_index": "stof_64169215",
        "_type": "_doc",
        "_id": "2",
        "_score": 2.7596164,
        "_source": {
          "name": "bar",
          "count": 5
        },
        "_explanation": {
          "value": 2.7596164,
          "description": "sum of",
          "details": [
            {
              "value": 0.52354836,
              "description": "sum of:",
              "details": [
                {
                  "value": 0.52354836,
                  "description": "weight(name:bar in 0) [PerFieldSimilarity], result of:",
                  "details": [
                    {
                      "value": 0.52354836,
                      "description": "score(freq=1.0), computed as boost * idf * tf from:",
                      "details": [
                        {
                          "value": 2.2,
                          "description": "boost",
                          "details": []
                        },
                        {
                          "value": 0.47000363,
                          "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                          "details": [
                            {
                              "value": 2,
                              "description": "n, number of documents containing term",
                              "details": []
                            },
                            {
                              "value": 3,
                              "description": "N, total number of documents with field",
                              "details": []
                            }
                          ]
                        },
                        {
                          "value": 0.50632906,
                          "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                          "details": [
                            {
                              "value": 1.0,
                              "description": "freq, occurrences of term within document",
                              "details": []
                            },
                            {
                              "value": 1.2,
                              "description": "k1, term saturation parameter",
                              "details": []
                            },
                            {
                              "value": 0.75,
                              "description": "b, length normalization parameter",
                              "details": []
                            },
                            {
                              "value": 1.0,
                              "description": "dl, length of field",
                              "details": []
                            },
                            {
                              "value": 1.3333334,
                              "description": "avgdl, average length of field",
                              "details": []
                            }
                          ]
                        }
                      ]
                    }
                  ]
                }
              ]
            },
            {
              "value": 2.236068,
              "description": "min of:",
              "details": [
                {
                  "value": 2.236068,
                  "description": "field value function: sqrt(doc['count'].value?:9.0 * factor=1.0)",
                  "details": []
                },
                {
                  "value": 3.4028235E38,
                  "description": "maxBoost",
                  "details": []
                }
              ]
            }
          ]
        }
      }
    ]
  }
}

搜索结果:

"hits": [
      {
        "_index": "stof_64169215",
        "_type": "_doc",
        "_id": "3",
        "_score": 3.7803833,
        "_source": {
          "name": "foo bar"
        }
      },
      {
        "_index": "stof_64169215",
        "_type": "_doc",
        "_id": "1",
        "_score": 3.685826,
        "_source": {
          "name": "foo",
          "count": 10
        }
      },
      {
        "_index": "stof_64169215",
        "_type": "_doc",
        "_id": "2",
        "_score": 2.7596164,
        "_source": {
          "name": "bar",
          "count": 5
        }
      }
    ]

【讨论】:

  • @Ajouve 你有没有机会仔细阅读我的回答,期待得到你的反馈:)
  • 谢谢,但是foo bar 我没有预期的结果匹配应该有更多的权重
  • @Ajouve 感谢您的回复 :) 但是您能否确认一下,如果您的问题的第二部分 但是寻找 foo 我想要 foo baz、foo 和 foo bar是否已解决(通过使用上述搜索查询)
  • @Ajouve 请仔细阅读我更新的答案,如果这是您的问题,请告诉我?我已删除 foo baz 更新搜索查询的文档,以便您清楚地了解您想要实现的目标)
【解决方案2】:

将此添加到您的搜索请求中:

  "sort": [
    {
      "name.keyword": {
        "order": "desc"
      }
    },
    "_score"
  ], 

您的完整搜索如下所示:

GET product/_search
{
  "sort": [
    {
      "name.keyword": {
        "order": "desc"
      }
    },
    "_score"
  ], 
  "query": {
    "function_score": {
      "query": {
        "match": {"name": "foo bar"}
      },
      "field_value_factor": {
        "field": "count",
        "missing": 0
      }
    }
  }
}

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-10-12
    • 2017-09-13
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多