【问题标题】:ElasticSearch scoring issueElasticSearch 评分问题
【发布时间】:2017-09-22 19:47:51
【问题描述】:

我正在尝试弄清楚 ElasticSearch 在按分数对结果进行排名时使用的逻辑。

我总共有 4 个索引。我正在查询一个术语的所有索引。我正在使用的查询如下-

GET /_all/static/_search
{
  "query": {
    "match": {
      "name": "chinese"
    }
  }
}

我得到的(部分)响应如下-

    {
   "took": 17,
   "timed_out": false,
   "_shards": {
      "total": 40,
      "successful": 40,
      "failed": 0
   },
   "hits": {
      "total": 6,
      "max_score": 2.96844,
      "hits": [
         {
            "_shard": 1,
            "_node": "Hz9L2DZ-ShSajaNvoyU8Eg",
            "_index": "restaurant",
            "_type": "static",
            "_id": "XecLkyYNQWihuR2atFc5JQ",
            "_score": 2.96844,
            "_source": {
               "name": "Just Chinese"
            },
            "_explanation": {
               "value": 2.96844,
               "description": "weight(name:chinese in 1) [PerFieldSimilarity], result of:",
               "details": [
                  {
                     "value": 2.96844,
                     "description": "fieldWeight in 1, product of:",
                     "details": [
                        {
                           "value": 1,
                           "description": "tf(freq=1.0), with freq of:",
                           "details": [
                              {
                                 "value": 1,
                                 "description": "termFreq=1.0"
                              }
                           ]
                        },
                        {
                           "value": 4.749504,
                           "description": "idf(docFreq=3, maxDocs=170)"
                        },
                        {
                           "value": 0.625,
                           "description": "fieldNorm(doc=1)"
                        }
                     ]
                  }
               ]
            }
         },
         {
            "_shard": 1,
            "_node": "Hz9L2DZ-ShSajaNvoyU8Eg",
            "_index": "restaurant",
            "_type": "static",
            "_id": "IAUpkC55ReySjvl9Xr5MVw",
            "_score": 2.96844,
            "_source": {
               "name": "The Chinese Hut"
            },
            "_explanation": {
               "value": 2.96844,
               "description": "weight(name:chinese in 5) [PerFieldSimilarity], result of:",
               "details": [
                  {
                     "value": 2.96844,
                     "description": "fieldWeight in 5, product of:",
                     "details": [
                        {
                           "value": 1,
                           "description": "tf(freq=1.0), with freq of:",
                           "details": [
                              {
                                 "value": 1,
                                 "description": "termFreq=1.0"
                              }
                           ]
                        },
                        {
                           "value": 4.749504,
                           "description": "idf(docFreq=3, maxDocs=170)"
                        },
                        {
                           "value": 0.625,
                           "description": "fieldNorm(doc=5)"
                        }
                     ]
                  }
               ]
            }
         },
         {
            "_shard": 2,
            "_node": "Hz9L2DZ-ShSajaNvoyU8Eg",
            "_index": "cuisine",
            "_type": "static",
            "_id": "6",
            "_score": 2.7047482,
            "_source": {
               "name": "Chinese"
            },
            "_explanation": {
               "value": 2.7047482,
               "description": "weight(name:chinese in 1) [PerFieldSimilarity], result of:",
               "details": [
                  {
                     "value": 2.7047482,
                     "description": "fieldWeight in 1, product of:",
                     "details": [
                        {
                           "value": 1,
                           "description": "tf(freq=1.0), with freq of:",
                           "details": [
                              {
                                 "value": 1,
                                 "description": "termFreq=1.0"
                              }
                           ]
                        },
                        {
                           "value": 2.7047482,
                           "description": "idf(docFreq=1, maxDocs=11)"
                        },
                        {
                           "value": 1,
                           "description": "fieldNorm(doc=1)"
                        }
                     ]
                  }
               ]
            }
         },

我的问题是-我知道 elasticsearch 以更高的分数处理较小的值,那么为什么餐厅索引中的“Just Chinese”和“The Chinese Hut”之类的结果排名高于预期的最佳匹配“chinese”从美食指数?据我所知,在将这些文档插入索引时,我没有使用任何特殊的分析器或任何东西。一切都是默认的。

我缺少什么以及如何获得预期的结果?

【问题讨论】:

    标签: elasticsearch indexing nosql


    【解决方案1】:

    计算分数的重要参数之一是inverse document frequency (IDF)。默认情况下,elasticsearch 的每个分片都会尝试根据本地 IDF 来估计全局 IDF。当您有许多均匀分布在分片中的相似记录时,它会起作用。但是,当您只有几条记录或将来自多个分片的结果与非常不同类型的记录(美食名称和餐厅名称)组合时,估计 IDF 可能会产生奇怪的结果。这个问题的解决方法是使用elasticsearch的dfs_query_then_fetch搜索方式。

    顺便说一下,为了了解elasticsearch是如何计算分数的,你可以在你的搜索请求或url上使用explain参数。因此,当您询问有关评分的问题时,将说明设置为 true 的输出会有所帮助。

    【讨论】:

    • dfs_query_then_fetch 工作!现在我也明白为什么它会这样工作了!感谢您的解释!另外,我已经编辑了回复以包含原始回复的解释。
    猜你喜欢
    • 2016-03-23
    • 1970-01-01
    • 2010-12-13
    • 2014-10-04
    • 1970-01-01
    • 2019-02-10
    • 2016-01-17
    • 2016-04-09
    • 2016-06-01
    相关资源
    最近更新 更多