【问题标题】:Elasticsearch match and boostingElasticsearch 匹配和提升
【发布时间】:2018-03-27 17:58:28
【问题描述】:

我有一个弹性映射,例如:

{
  first_name: {
    type: 'text'
  },
  last_name: {
    type: 'text'
  }
}

我有 2 个文件。一个名字为Amit,姓氏Hello,另一个名字为Hello,姓氏Amit

我给名字加了 2,给姓加了 1。但是,当我搜索关键字:Amit 时,我会在顶部看到姓氏为 Amit 的文档。当关键字为Hello时,结果符合预期。

我在解释中看到的唯一区别是第二条记录的 docFreq2 和第一条记录的 1

我不确定为什么第二个文档得分更高。欢迎任何帮助!

这是查询:

{
    "query": {
        "bool": {
            "filter": [{
                "term": {
                    "enabled": true
                }
            }, {
                "terms": {
                    "roles": ["influencer"]
                }
            }],
            "should": [{
                "match": {
                    "first_name": {
                        "query": "Amit",
                        "boost": 1
                    }
                }
            }, {
                "match": {
                    "last_name": {
                        "query": "Amit",
                        "boost": 1
                    }
                }
            }],
            "minimum_should_match": 1
        }
    }
}


{
    "_index": "development-users",
    "_type": "users",
    "_id": "10",
    "matched": true,
    "explanation": {
        "value": 175.57181,
        "description": "sum of:",
        "details": [
            {
                "value": 175.57181,
                "description": "sum of:",
                "details": [
                    {
                        "value": 43.892952,
                        "description": "weight(last_name:gur in 1) [PerFieldSimilarity], result of:",
                        "details": [
                            {
                                "value": 43.892952,
                                "description": "score(doc=1,freq=1.0 = termFreq=1.0\n), product of:",
                                "details": [
                                    {
                                        "value": 30,
                                        "description": "boost",
                                        "details": []
                                    },
                                    {
                                        "value": 1.2809339,
                                        "description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                        "details": [
                                            {
                                                "value": 2,
                                                "description": "docFreq",
                                                "details": []
                                            },
                                            {
                                                "value": 8,
                                                "description": "docCount",
                                                "details": []
                                            }
                                        ]
                                    },
                                    {
                                        "value": 1.1422123,
                                        "description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
                                        "details": [
                                            {
                                                "value": 1,
                                                "description": "termFreq=1.0",
                                                "details": []
                                            },
                                            {
                                                "value": 1.2,
                                                "description": "parameter k1",
                                                "details": []
                                            },
                                            {
                                                "value": 0.75,
                                                "description": "parameter b",
                                                "details": []
                                            },
                                            {
                                                "value": 5.75,
                                                "description": "avgFieldLength",
                                                "details": []
                                            },
                                            {
                                                "value": 4,
                                                "description": "fieldLength",
                                                "details": []
                                            }
                                        ]
                                    }
                                ]
                            }
                        ]
                    },
                    {
                        "value": 43.892952,
                        "description": "weight(last_name:gurj in 1) [PerFieldSimilarity], result of:",
                        "details": [
                            {
                                "value": 43.892952,
                                "description": "score(doc=1,freq=1.0 = termFreq=1.0\n), product of:",
                                "details": [
                                    {
                                        "value": 30,
                                        "description": "boost",
                                        "details": []
                                    },
                                    {
                                        "value": 1.2809339,
                                        "description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                        "details": [
                                            {
                                                "value": 2,
                                                "description": "docFreq",
                                                "details": []
                                            },
                                            {
                                                "value": 8,
                                                "description": "docCount",
                                                "details": []
                                            }
                                        ]
                                    },
                                    {
                                        "value": 1.1422123,
                                        "description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
                                        "details": [
                                            {
                                                "value": 1,
                                                "description": "termFreq=1.0",
                                                "details": []
                                            },
                                            {
                                                "value": 1.2,
                                                "description": "parameter k1",
                                                "details": []
                                            },
                                            {
                                                "value": 0.75,
                                                "description": "parameter b",
                                                "details": []
                                            },
                                            {
                                                "value": 5.75,
                                                "description": "avgFieldLength",
                                                "details": []
                                            },
                                            {
                                                "value": 4,
                                                "description": "fieldLength",
                                                "details": []
                                            }
                                        ]
                                    }
                                ]
                            }
                        ]
                    },
                    {
                        "value": 43.892952,
                        "description": "weight(last_name:gurjo in 1) [PerFieldSimilarity], result of:",
                        "details": [
                            {
                                "value": 43.892952,
                                "description": "score(doc=1,freq=1.0 = termFreq=1.0\n), product of:",
                                "details": [
                                    {
                                        "value": 30,
                                        "description": "boost",
                                        "details": []
                                    },
                                    {
                                        "value": 1.2809339,
                                        "description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                        "details": [
                                            {
                                                "value": 2,
                                                "description": "docFreq",
                                                "details": []
                                            },
                                            {
                                                "value": 8,
                                                "description": "docCount",
                                                "details": []
                                            }
                                        ]
                                    },
                                    {
                                        "value": 1.1422123,
                                        "description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
                                        "details": [
                                            {
                                                "value": 1,
                                                "description": "termFreq=1.0",
                                                "details": []
                                            },
                                            {
                                                "value": 1.2,
                                                "description": "parameter k1",
                                                "details": []
                                            },
                                            {
                                                "value": 0.75,
                                                "description": "parameter b",
                                                "details": []
                                            },
                                            {
                                                "value": 5.75,
                                                "description": "avgFieldLength",
                                                "details": []
                                            },
                                            {
                                                "value": 4,
                                                "description": "fieldLength",
                                                "details": []
                                            }
                                        ]
                                    }
                                ]
                            }
                        ]
                    },
                    {
                        "value": 43.892952,
                        "description": "weight(last_name:gurjot in 1) [PerFieldSimilarity], result of:",
                        "details": [
                            {
                                "value": 43.892952,
                                "description": "score(doc=1,freq=1.0 = termFreq=1.0\n), product of:",
                                "details": [
                                    {
                                        "value": 30,
                                        "description": "boost",
                                        "details": []
                                    },
                                    {
                                        "value": 1.2809339,
                                        "description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                        "details": [
                                            {
                                                "value": 2,
                                                "description": "docFreq",
                                                "details": []
                                            },
                                            {
                                                "value": 8,
                                                "description": "docCount",
                                                "details": []
                                            }
                                        ]
                                    },
                                    {
                                        "value": 1.1422123,
                                        "description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
                                        "details": [
                                            {
                                                "value": 1,
                                                "description": "termFreq=1.0",
                                                "details": []
                                            },
                                            {
                                                "value": 1.2,
                                                "description": "parameter k1",
                                                "details": []
                                            },
                                            {
                                                "value": 0.75,
                                                "description": "parameter b",
                                                "details": []
                                            },
                                            {
                                                "value": 5.75,
                                                "description": "avgFieldLength",
                                                "details": []
                                            },
                                            {
                                                "value": 4,
                                                "description": "fieldLength",
                                                "details": []
                                            }
                                        ]
                                    }
                                ]
                            }
                        ]
                    }
                ]
            },
            {
                "value": 0,
                "description": "match on required clause, product of:",
                "details": [
                    {
                        "value": 0,
                        "description": "# clause",
                        "details": []
                    },
                    {
                        "value": 0,
                        "description": "weight(enabled:T in 1) [], result of:",
                        "details": [
                            {
                                "value": 0,
                                "description": "score(doc=1,freq=1.0), with freq of:",
                                "details": [
                                    {
                                        "value": 1,
                                        "description": "termFreq=1.0",
                                        "details": []
                                    }
                                ]
                            }
                        ]
                    }
                ]
            },
            {
                "value": 0,
                "description": "match on required clause, product of:",
                "details": [
                    {
                        "value": 0,
                        "description": "# clause",
                        "details": []
                    },
                    {
                        "value": 0,
                        "description": "weight(roles:influencer in 1) [], result of:",
                        "details": [
                            {
                                "value": 0,
                                "description": "score(doc=1,freq=1.0), with freq of:",
                                "details": [
                                    {
                                        "value": 1,
                                        "description": "termFreq=1.0",
                                        "details": []
                                    }
                                ]
                            }
                        ]
                    }
                ]
            }
        ]
    }
}

此时的文档数量只有 10 个。

【问题讨论】:

  • 那么,你的问题是什么?
  • @Lupanoide 好点。更新了问题。谢谢!
  • 您使用了哪个查询来检索 docFreq?您的索引中有多少个文档?
  • 使用说明查找评分是如何完成的。添加了对问题的查询和响应。已在说明中搜索名称“gurjot”。
  • 啊,好的。我从来没有使用过解释 API,我通过你使用的是 termVector API。但是,您需要使用 docFreq 来评估检索算法所需的 IDF 度量 - 此处为文档 elastic.co/guide/en/elasticsearch/guide/current/… 。 DocFreq 显示包含该术语的文档数量,而不是匹配文档的数量

标签: elasticsearch


【解决方案1】:

您的查询仅检索到一个文档"_id": "10",第二个堆栈继承于您查询的第二个子句。第一个值参数,"value": 175.57181,是所有子句的值的总和,我不知道你在哪里读过docFreq 1 - 在哪个堆栈中 - 但它继承了某些单词的频率 - 也可能是 @ 987654324@ 在enabled 字段中! - 在您所有的索引文档中

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-10-05
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多