elasticsearch中按文档划分的子文档字段的平均值答案

【问题标题】：Average of subdocument fields by document in elasticsearchelasticsearch中按文档划分的子文档字段的平均值
【发布时间】：2021-05-07 03:58:56
【问题描述】：

我有一个弹性搜索映射，它代表学生，其属性将他们的分数表示为对象数组：

properties: {
  name: { type: "text" },
  /* ... */
  marks: {
    properties: {
      value: { type: "float" }
    }
  }
}

基于此映射，文档以这种形式存储：

"hits" : [{
  "_index" : "students",
  "_type" : "_doc",
  "_id" : "...",
  "_score" : 1.0,
  "_source" : {
    "name" : "John Doe",
    "marks" : [
      {
        "_id" : "...",
        "value" : 4
      },
      {
        "_id" : "...",
        "value" : 0
      }
    ]
  }
}, 
{
  "_index" : "students",
  "_type" : "_doc",
  "_id" : "...",
  "_score" : 1.0,
  "_source" : {
    "name" : "Jane Doe",
    "marks" : [
      {
        "_id" : "...",
        "value" : 5
      },
      {
        "_id" : "...",
        "value" : 4
      }
    ]
  }
}, /* ... */]

每个学生都有很多分数。我想在弹性搜索的结果中得到学生标记值的平均值（所以通过弹性搜索中索引的文档）。

我尝试了聚合：

"aggs": {
  "avg_mark": {
    "avg": { "field": "marks.value" }
  }
}

但我得到了所有学生的平均值：

aggregations: { avg_mark: { value: 3.25 } }

然后我尝试了排序：

"sort": [{
  "marks.value": {
    "order": "desc",
    "mode": "avg"
  }
}]

学生的平均成绩很好，但是：

它对我的结果进行排序，我并不总是需要它
它将平均结果存储在一个没有键的数组中来检索它。这不是我需要的，因为排序属性顺序可能会根据用户搜索而改变。

"hits" : [{
  "_index" : "students",
  "_type" : "_doc",
  "_id" : "...",
  "_score" : 1.0,
  "_source" : {
    "name" : "John Doe",
    "marks" : [
      {
        "_id" : "...",
        "value" : 4
      },
      {
        "_id" : "...",
        "value" : 0
      }
    ]
  },
  "sort" : [ 2.0 ]
}, 
{
  "_index" : "students",
  "_type" : "_doc",
  "_id" : "...",
  "_score" : 1.0,
  "_source" : {
    "name" : "Jane Doe",
    "marks" : [
      {
        "_id" : "...",
        "value" : 5
      },
      {
        "_id" : "...",
        "value" : 4
      }
    ]
  },
  "sort" : [ 4.5 ]
}, /* ... */]

这个排序数组可以是[ 4.5, value_b, value_c, ... ] 或[value_b, value_c, 4.5 ]，具体取决于排序搜索请求属性。

我也尝试过使用嵌套类型，但没有成功。

如何在不对结果进行排序并且轻松检索结果的情况下获得文档/学生的平均值？

提前谢谢你。

【问题讨论】：

标签： javascript node.js database elasticsearch

【解决方案1】：

您的第一次尝试是朝着正确方向迈出的一步 - 只需确保在计算平均分数之前按学生姓名分组：

GET students/_search
{ 
  "size": 0,
  "aggs": {
    "by_student": {
      "terms": {
        "field": "name.keyword",
        "size": 10
      },
      "aggs": {
        "avg_mark": {
          "avg": {
            "field": "marks.value"
          }
        }
      }
    }
  }
}

.keyword field suffix 来自这个稍作调整的映射：

PUT students
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"     <--
          }
        }
      },
      "marks": {
        "properties": {
          "value": {
            "type": "float"
          }
        }
      }
    }
  }
}

顺便说一句——如果您想将搜索范围缩小到只有少数学生，只需包含以下顶级查询：

{
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {
            "name.keyword": [
              "John Doe",
              "Jane Doe"
            ]
          }
        }
      ]
    }
  },
  "aggs": { ... }
}

然后聚合将只考虑过滤后的文档集。

【讨论】：

谢谢。当我看到您的通知时，我正在编辑我的问题；）我做了几乎相同的事情，但使用了_id 字段。 "terms": { "field": "_id" }, "aggs": { "avg_marks": { "avg": { "field": "marks.value" }}}
对于将阅读我上面评论的人，我刚刚在此处的文档中发现：elastic.co/guide/en/elasticsearch/reference/current/… _id 不应该用于聚合。