如何在 Elasticsearch Bucket Aggregation 查询中获取文档值而不是文档计数答案

【问题标题】：How to get doc value in Elasticsearch Bucket Aggregation query instead of doc count如何在 Elasticsearch Bucket Aggregation 查询中获取文档值而不是文档计数
【发布时间】：2021-07-06 01:24:01
【问题描述】：

我的 ES 索引中有四个文档。

       {
            "_index": "my-index",
            "_type": "_doc",
            "_id": "1",
            "_score": 1.0,
            "_source": {
                "@timestamp": "2099-11-15T13:12:00",
                "message": "INFO GET /search HTTP/1.1 200 1070000",
                "user": {
                    "id": "test@gmail.com"
                }
            }
        },
        {
            "_index": "my-index",
            "_type": "_doc",
            "_id": "2",
            "_score": 1.0,
            "_source": {
                "@timestamp": "2099-11-15T13:15:00",
                "message": "Error GET /search HTTP/1.1 200 1070000",
                "user": {
                    "id": "test@gmail.com"
                }
            }
        },
       {
            "_index": "my-index",
            "_type": "_doc",
            "_id": "3",
            "_score": 1.0,
            "_source": {
                "@timestamp": "2099-11-15T13:20:00",
                "message": "INFO GET /parse HTTP/1.1 200 1070000",
                "user": {
                    "id": "test@gmail.com"
                }
            }
        },
        {
            "_index": "my-index",
            "_type": "_doc",
            "_id": "4",
            "_score": 1.0,
            "_source": {
                "@timestamp": "2099-11-15T13:26:00",
                "message": "Error GET /parse HTTP/1.1 200 1070000",
                "user": {
                    "id": "test@gmail.com"
                }
            }
        }

我正在编写存储桶聚合查询，使用过滤器按消息类型（信息或错误）对索引中的所有文档进行分组。在我上面的示例中，索引中有 4 个文档，两个具有“信息”类型的消息，两个具有“错误”类型的消息。

我想编写存储桶聚合查询，以便我可以按消息类型获取结果组。预期结果应该是两个桶，每个桶有两个文档。但我的查询只返回每个桶的文档计数，而不是实际的文档值。

我正在使用的查询是：

 {
   "size":0,
   "aggs" : {
     "messages" : {
       "filters" : {
          "filters" : {
             "info" :   { "match" : { "message" : "Info"   }},
             "error" : { "match" : { "message" : "Error"   }}
          }
        }
     }
  }
}

上述查询的输出是：

       {
"took": 3,
"timed_out": false,
"_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
},
"hits": {
    "total": {
        "value": 2,
        "relation": "eq"
    },
    "max_score": null,
    "hits": []
},
"aggregations": {
    "messages": {
        "buckets": {
            "errors": {
                "doc_count": 2
            },
            "info": {
                "doc_count": 2
            }
        }
    }
}
   }

但我的要求是在存储桶组中获取具有字段值的实际文档。有什么方法可以用过滤器更改存储桶聚合查询，以便我可以获取每个存储桶中的值的文档？

【问题讨论】：

标签： elasticsearch elasticsearch-5 elasticsearch-aggregation elasticsearch-dsl

【解决方案1】：

可以使用top_hits aggregation，获取bucket组内对应的文档

{
  "size": 0,
  "aggs": {
    "messages": {
      "filters": {
        "filters": {
          "info": {
            "match": {
              "message": "Info"
            }
          },
          "error": {
            "match": {
              "message": "Error"
            }
          }
        }
      },
      "aggs": {
        "top_filters_hits": {
          "top_hits": {
            "_source": {
              "includes": [
                "message",
                "user.id"
              ]
            }
          }
        }
      }
    }
  }
}

搜索结果将是

"aggregations": {
    "messages": {
      "buckets": {
        "error": {
          "doc_count": 2,
          "top_filters_hits": {
            "hits": {
              "total": {
                "value": 2,
                "relation": "eq"
              },
              "max_score": 1.0,
              "hits": [
                {
                  "_index": "67033379",
                  "_type": "_doc",
                  "_id": "2",
                  "_score": 1.0,
                  "_source": {
                    "message": "Error GET /search HTTP/1.1 200 1070000",
                    "user": {
                      "id": "test@gmail.com"
                    }
                  }
                },
                {
                  "_index": "67033379",
                  "_type": "_doc",
                  "_id": "4",
                  "_score": 1.0,
                  "_source": {
                    "message": "Error GET /parse HTTP/1.1 200 1070000",
                    "user": {
                      "id": "test@gmail.com"
                    }
                  }
                }
              ]
            }
          }
        },
        "info": {
          "doc_count": 2,
          "top_filters_hits": {
            "hits": {
              "total": {
                "value": 2,
                "relation": "eq"
              },
              "max_score": 1.0,
              "hits": [
                {
                  "_index": "67033379",
                  "_type": "_doc",
                  "_id": "1",
                  "_score": 1.0,
                  "_source": {
                    "message": "INFO GET /search HTTP/1.1 200 1070000",
                    "user": {
                      "id": "test@gmail.com"
                    }
                  }
                },
                {
                  "_index": "67033379",
                  "_type": "_doc",
                  "_id": "3",
                  "_score": 1.0,
                  "_source": {
                    "message": "INFO GET /parse HTTP/1.1 200 1070000",
                    "user": {
                      "id": "test@gmail.com"
                    }
                  }
                }
              ]
            }
          }
        }
      }
    }
  }

【讨论】：

@Piyush N 请仔细阅读答案，如果这能解决您的问题，请告诉我？