【发布时间】:2018-03-27 17:58:28
【问题描述】:
我有一个弹性映射,例如:
{
first_name: {
type: 'text'
},
last_name: {
type: 'text'
}
}
我有 2 个文件。一个名字为Amit,姓氏Hello,另一个名字为Hello,姓氏Amit。
我给名字加了 2,给姓加了 1。但是,当我搜索关键字:Amit 时,我会在顶部看到姓氏为 Amit 的文档。当关键字为Hello时,结果符合预期。
我在解释中看到的唯一区别是第二条记录的 docFreq 是 2 和第一条记录的 1。
我不确定为什么第二个文档得分更高。欢迎任何帮助!
这是查询:
{
"query": {
"bool": {
"filter": [{
"term": {
"enabled": true
}
}, {
"terms": {
"roles": ["influencer"]
}
}],
"should": [{
"match": {
"first_name": {
"query": "Amit",
"boost": 1
}
}
}, {
"match": {
"last_name": {
"query": "Amit",
"boost": 1
}
}
}],
"minimum_should_match": 1
}
}
}
{
"_index": "development-users",
"_type": "users",
"_id": "10",
"matched": true,
"explanation": {
"value": 175.57181,
"description": "sum of:",
"details": [
{
"value": 175.57181,
"description": "sum of:",
"details": [
{
"value": 43.892952,
"description": "weight(last_name:gur in 1) [PerFieldSimilarity], result of:",
"details": [
{
"value": 43.892952,
"description": "score(doc=1,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 30,
"description": "boost",
"details": []
},
{
"value": 1.2809339,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 2,
"description": "docFreq",
"details": []
},
{
"value": 8,
"description": "docCount",
"details": []
}
]
},
{
"value": 1.1422123,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 5.75,
"description": "avgFieldLength",
"details": []
},
{
"value": 4,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
},
{
"value": 43.892952,
"description": "weight(last_name:gurj in 1) [PerFieldSimilarity], result of:",
"details": [
{
"value": 43.892952,
"description": "score(doc=1,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 30,
"description": "boost",
"details": []
},
{
"value": 1.2809339,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 2,
"description": "docFreq",
"details": []
},
{
"value": 8,
"description": "docCount",
"details": []
}
]
},
{
"value": 1.1422123,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 5.75,
"description": "avgFieldLength",
"details": []
},
{
"value": 4,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
},
{
"value": 43.892952,
"description": "weight(last_name:gurjo in 1) [PerFieldSimilarity], result of:",
"details": [
{
"value": 43.892952,
"description": "score(doc=1,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 30,
"description": "boost",
"details": []
},
{
"value": 1.2809339,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 2,
"description": "docFreq",
"details": []
},
{
"value": 8,
"description": "docCount",
"details": []
}
]
},
{
"value": 1.1422123,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 5.75,
"description": "avgFieldLength",
"details": []
},
{
"value": 4,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
},
{
"value": 43.892952,
"description": "weight(last_name:gurjot in 1) [PerFieldSimilarity], result of:",
"details": [
{
"value": 43.892952,
"description": "score(doc=1,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 30,
"description": "boost",
"details": []
},
{
"value": 1.2809339,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 2,
"description": "docFreq",
"details": []
},
{
"value": 8,
"description": "docCount",
"details": []
}
]
},
{
"value": 1.1422123,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 5.75,
"description": "avgFieldLength",
"details": []
},
{
"value": 4,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
}
]
},
{
"value": 0,
"description": "match on required clause, product of:",
"details": [
{
"value": 0,
"description": "# clause",
"details": []
},
{
"value": 0,
"description": "weight(enabled:T in 1) [], result of:",
"details": [
{
"value": 0,
"description": "score(doc=1,freq=1.0), with freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0",
"details": []
}
]
}
]
}
]
},
{
"value": 0,
"description": "match on required clause, product of:",
"details": [
{
"value": 0,
"description": "# clause",
"details": []
},
{
"value": 0,
"description": "weight(roles:influencer in 1) [], result of:",
"details": [
{
"value": 0,
"description": "score(doc=1,freq=1.0), with freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0",
"details": []
}
]
}
]
}
]
}
]
}
}
此时的文档数量只有 10 个。
【问题讨论】:
-
那么,你的问题是什么?
-
@Lupanoide 好点。更新了问题。谢谢!
-
您使用了哪个查询来检索 docFreq?您的索引中有多少个文档?
-
使用说明查找评分是如何完成的。添加了对问题的查询和响应。已在说明中搜索名称“gurjot”。
-
啊,好的。我从来没有使用过解释 API,我通过你使用的是 termVector API。但是,您需要使用 docFreq 来评估检索算法所需的 IDF 度量 - 此处为文档 elastic.co/guide/en/elasticsearch/guide/current/… 。 DocFreq 显示包含该术语的文档数量,而不是匹配文档的数量
标签: elasticsearch