Elasticsearch 聚合之聚合答案

【问题标题】：Elasticsearch aggregation of aggregationElasticsearch 聚合之聚合
【发布时间】：2017-09-24 19:51:17
【问题描述】：

我想知道是否有一种方法可以做类似于bucket_selector 但基于键匹配而不是数字度量的测试。

为了提供更多上下文，这是我的用例：

数据样本：

[
  {
    "@version": "1",
    "@timestamp": "2017-04-27T04:28:23.589Z",
    "type": "json",
    "headers": {
      "message": {
        "type": "requestactivation"
      }
    },
    "id": "668"
  },
  {
    "@version": "1",
    "@timestamp": "2017-04-27T04:32:23.589Z",
    "type": "json",
    "headers": {
      "message": {
        "type": "requestactivation"
      }
    },
    "id": "669"
  },
  {
    "@version": "1",
    "@timestamp": "2017-04-27T04:30:00.802Z",
    "type": "json",
    "headers": {
      "message": {
        "type": "activationrequested"
      }
    },
    "id": "668"
  }
]

我想检索最后一个事件类型为 requestactivation 的所有 id。

我已经有一个聚合，可以检索每个 id 的最后一个事件类型，但是我还没有弄清楚如何根据key过滤桶

这里是查询：

{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "exists": {
            "field": "id"
          }
        },
        {
          "terms": {
            "headers.message.type": [
              "requestactivation",
              "activationrequested"
            ]
          }
        }
      ]
    }
  },
  "aggs": {
    "id": {
      "terms": {
        "field": "id",
        "size": 10000
      },
      "aggs": {
        "latest": {
          "max": {
            "field": "@timestamp"
          }
        },
        "hmtype": {
          "terms": {
            "field": "headers.message.type",
            "size": 1
          }
        }
      }
    }
  }
}

这是一个结果示例：

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "id": {
      "doc_count_error_upper_bound": 3,
      "sum_other_doc_count": 46,
      "buckets": [
        {
          "key": "986",
          "doc_count": 4,
          "hmtype": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 2,
            "buckets": [
              {
                "key": "activationrequested",
                "doc_count": 2
              }
            ]
          },
          "latest": {
            "value": 1493238253603,
            "value_as_string": "2017-04-26T20:24:13.603Z"
          }
        },
        {
          "key": "967",
          "doc_count": 2,
          "hmtype": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 1,
            "buckets": [
              {
                "key": "requestactivation",
                "doc_count": 1
              }
            ]
          },
          "latest": {
            "value": 1493191161242,
            "value_as_string": "2017-04-26T07:19:21.242Z"
          }
        },
        {
          "key": "554",
          "doc_count": 7,
          "hmtype": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 5,
            "buckets": [
              {
                "key": "requestactivation",
                "doc_count": 5
              }
            ]
          },
          "latest": {
            "value": 1493200196871,
            "value_as_string": "2017-04-26T09:49:56.871Z"
          }
        }
      ]
    }
  }
}

不分析所有映射（关键字）。

目标是将结果减少到仅存储桶中的键为“requestactivation”的结果。

文档计数不能用于一个 id 的activationrequest 可以多次出现。

最近才开始研究聚合，所以如果问题似乎很明显，我们很抱歉，周围的例子似乎与这个特定的逻辑不匹配。

【问题讨论】：

标签： elasticsearch aggregation elasticsearch-2.0

【解决方案1】：

include 在terms 聚合中使用如何将术语中包含的值“过滤”为仅与请求相关的值：

{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "exists": {
            "field": "id"
          }
        },
        {
          "terms": {
            "headers.message.type": [
              "requestactivation",
              "activationrequested"
            ]
          }
        }
      ]
    }
  },
  "aggs": {
    "id": {
      "terms": {
        "field": "id",
        "size": 10000
      },
      "aggs": {
        "latest": {
          "max": {
            "field": "@timestamp"
          }
        },
        "hmtype": {
          "filter": {
            "terms": {
              "headers.message.type": [
                "requestactivation",
                "activationrequested"
              ]
            }
          },
          "aggs": {
            "count_types": {
              "cardinality": {
                "field": "headers.message.type"
              }
            }
          }
        },
        "filter_buckets": {
          "bucket_selector": {
            "buckets_path": {
              "totalTypes":"hmtype > count_types"
            },
            "script": "params.totalTypes == 2"
          }
        }
      }
    }
  }
}

【讨论】：

我可能遗漏了一些东西，但是测试了建议的包含我最终得到了所有具有“activationrequested”事件的 id（从您的示例中，我实际上正在寻找“requestactivation”），这是否id是否有其他类型的事件。
我的错，应该是"include": "requestactivation"...但我感觉路上有一些限制。
但是包含的行为基本上与我在查询中过滤掉激活请求的 events 相同（因为我不关心每个查询的命中）。而我想过滤掉收到激活请求的 ids。
不关注。用例是什么？您想查看已激活的 id 吗？并且可以有多个请求和多个批准？或者至少有一个请求和 1 个（确切地说是一个）批准？
确实，我想看看哪些 id 被激活了。可能有多个请求和多个批准，在这种特定情况下，我很乐意过滤掉任何已收到批准的 id，无论哪个计数。我已经尝试了其他查询，例如一个没有最大聚合的查询，并将 hmtype 术语聚合大小增加到 2，但还没有找到一种方法来根据激活请求事件的存在过滤掉结果桶。