【发布时间】:2014-02-08 10:32:18
【问题描述】:
我正在对多个字段_all 和tags.name 执行一个查询字符串查询,并试图了解评分。查询:{"query":{"query_string":{"query":"animal","fields":["_all","tags.name"]}}}。以下是查询返回的文档:
-
文档 1 在
tags.name上完全匹配,但在_all上却没有。 -
文档 8 与
tags.name和_all完全匹配。
Document 8 应该会赢,而且确实赢了,但我对评分的结果感到困惑。似乎文档 1 的 tags.name 分数乘以 IDF 两次而受到惩罚,而文档 8 的 tags.name 分数仅乘以 IDF 一次。简而言之:
- 它们都有一个组件
weight(tags.name:animal in 0) [PerFieldSimilarity]。 - 在文档 1 中,我们有
weight = score = queryWeight x fieldWeight。 - 在文档 8 中,我们有
weight = fieldWeight!
由于queryWeight 包含idf,这将导致文档 1 被其 idf 惩罚两次。
谁能理解这个?
其他信息
- 如果我从查询字段中删除
_all,queryWeight将完全从解释中消失。 - 添加
"use_dis_max":true作为选项无效。- 但是,另外添加
"tie_breaker":0.7(或任何值)确实会影响 Document 8,因为它会使用我们在 Document 1 中看到的更复杂的公式。 - 想法:布尔查询(就是这样)可能会故意这样做,以便为匹配多个子查询的查询赋予更多权重,这似乎是合理的。但是,这对于 dis_max 查询没有任何意义,它应该只返回子查询的最大值。
- 但是,另外添加
以下是相关的解释请求。寻找嵌入式 cmets。
文档 1(仅匹配 tags.name):
curl -XGET 'http://localhost:9200/questions/question/1/_explain?pretty' -d '{"query":{"query_string":{"query":"animal","fields":["_all","tags.name"]}}}':
{
"ok" : true,
"_index" : "questions_1390104463",
"_type" : "question",
"_id" : "1",
"matched" : true,
"explanation" : {
"value" : 0.058849156,
"description" : "max of:",
"details" : [ {
"value" : 0.058849156,
"description" : "weight(tags.name:animal in 0) [PerFieldSimilarity], result of:",
// weight = score = queryWeight x fieldWeight
"details" : [ {
// score and queryWeight are NOT a part of the other explain!
"value" : 0.058849156,
"description" : "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details" : [ {
"value" : 0.30685282,
"description" : "queryWeight, product of:",
"details" : [ {
// This idf is NOT a part of the other explain!
"value" : 0.30685282,
"description" : "idf(docFreq=1, maxDocs=1)"
}, {
"value" : 1.0,
"description" : "queryNorm"
} ]
}, {
"value" : 0.19178301,
"description" : "fieldWeight in 0, product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(freq=1.0), with freq of:",
"details" : [ {
"value" : 1.0,
"description" : "termFreq=1.0"
} ]
}, {
"value" : 0.30685282,
"description" : "idf(docFreq=1, maxDocs=1)"
}, {
"value" : 0.625,
"description" : "fieldNorm(doc=0)"
} ]
} ]
} ]
} ]
}
文档 8(匹配 _all 和 tags.name):
curl -XGET 'http://localhost:9200/questions/question/8/_explain?pretty' -d '{"query":{"query_string":{"query":"animal","fields":["_all","tags.name"]}}}':
{
"ok" : true,
"_index" : "questions_1390104463",
"_type" : "question",
"_id" : "8",
"matched" : true,
"explanation" : {
"value" : 0.15342641,
"description" : "max of:",
"details" : [ {
"value" : 0.033902764,
"description" : "btq, product of:",
"details" : [ {
"value" : 0.033902764,
"description" : "weight(_all:anim in 0) [PerFieldSimilarity], result of:",
"details" : [ {
"value" : 0.033902764,
"description" : "fieldWeight in 0, product of:",
"details" : [ {
"value" : 0.70710677,
"description" : "tf(freq=0.5), with freq of:",
"details" : [ {
"value" : 0.5,
"description" : "phraseFreq=0.5"
} ]
}, {
"value" : 0.30685282,
"description" : "idf(docFreq=1, maxDocs=1)"
}, {
"value" : 0.15625,
"description" : "fieldNorm(doc=0)"
} ]
} ]
}, {
"value" : 1.0,
"description" : "allPayload(...)"
} ]
}, {
"value" : 0.15342641,
"description" : "weight(tags.name:animal in 0) [PerFieldSimilarity], result of:",
// weight = fieldWeight
// No score or queryWeight in sight!
"details" : [ {
"value" : 0.15342641,
"description" : "fieldWeight in 0, product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(freq=1.0), with freq of:",
"details" : [ {
"value" : 1.0,
"description" : "termFreq=1.0"
} ]
}, {
"value" : 0.30685282,
"description" : "idf(docFreq=1, maxDocs=1)"
}, {
"value" : 0.5,
"description" : "fieldNorm(doc=0)"
} ]
} ]
} ]
}
}
【问题讨论】:
-
您好,您自己找到答案了吗?或者你有什么资料可以研究吗?我正遭受同样的缺乏理解。在我们的例子中,这会严重影响一些命中,我需要了解为什么以及如何调整我们的查询。
-
不,很遗憾,我从来没有找到答案.. 很想知道你听到了什么。
标签: elasticsearch lucene