【问题标题】:ElasticSearch: Order top-level aggregation buckets based on reverse_nested doc_countElasticSearch:根据 reverse_nested doc_count 排序顶级聚合桶
【发布时间】:2021-09-19 07:42:45
【问题描述】:

我正在使用 ElasticSearch 6.3,并且正在处理具有多个子聚合的聚合,其中我需要根据较低级别的 reverse_nested 聚合的 doc_count 对顶级聚合存储桶进行排序。

这就是我的索引的创建方式:

PUT /myindex
{
  "mappings": {
    "default": {
      "properties": {
        "items": {
          "type": "nested",
          "properties": {
            "subitems": {
              "type": "nested",
              "properties": {
                "id": {
                  "type": "long"
                },
                "name": {
                  "type": "keyword"
                }
              }
            }
          }
        },
        "name": {
          "type": "keyword"
        }
      }
    }
  }
}

这些是我索引的示例文档:

{
  "name": "Document #1",
  "items": [
    {
      "subitems": [
        {
          "id": 1,
          "name": "Subitem #1"
        },
        {
          "id": 2,
          "name": "Subitem #2"
        }
      ]
    },
    {
      "subitems": [
        {
          "id": 2,
          "name": "Subitem #2"
        },
        {
          "id": 3,
          "name": "Subitem #3"
        }
      ]
    }
  ]
}
{
  "name": "Document #2",
  "items": [
    {
      "subitems": [
        {
          "id": 2,
          "name": "Subitem #2"
        }
      ]
    }
  ]
}
{
  "name": "Document #3",
  "items": [
    {
      "subitems": [
        {
          "id": 3,
          "name": "Subitem #3"
        }
      ]
    },
    {
      "subitems": [
        {
          "id": 2,
          "name": "Subitem #2"
        }
      ]
    }
  ]
}
{
  "name": "Document #4",
  "items": [
    {
      "subitems": [
        {
          "id": 2,
          "name": "Subitem #2"
        },
        {
          "id": 5,
          "name": "Subitem #5"
        }
      ]
    }
  ]
}
{
  "name": "Document #5",
  "items": [
    {
      "subitems": [
        {
          "id": 2,
          "name": "Subitem #2"
        }
      ]
    },
    {
      "subitems": [
        {
          "id": 2,
          "name": "Subitem #2"
        }
      ]
    },
    {
      "subitems": [
        {
          "id": 2,
          "name": "Subitem #2"
        }
      ]
    },
    {
      "subitems": [
        {
          "id": 2,
          "name": "Subitem #2"
        }
      ]
    },
    {
      "subitems": [
        {
          "id": 2,
          "name": "Subitem #2"
        }
      ]
    },
    {
      "subitems": [
        {
          "id": 2,
          "name": "Subitem #2"
        }
      ]
    }
  ]
}
{
  "name": "Document #6",
  "items": [
    {
      "subitems": [
        {
          "id": 3,
          "name": "Subitem #3"
        }
      ]
    }
  ]
}
{
  "name": "Document #7",
  "items": [
    {
      "subitems": [
        {
          "id": 3,
          "name": "Subitem #3"
        }
      ]
    }
  ]
}
{
  "name": "Document #8",
  "items": [
    {
      "subitems": [
        {
          "id": 3,
          "name": "Subitem #3"
        }
      ]
    }
  ]
}
{
  "name": "Document #9",
  "items": [
    {
      "subitems": [
        {
          "id": 3,
          "name": "Subitem #3"
        }
      ]
    }
  ]
}

我需要我的聚合能够提取包含每个子项 id/name 对的文档的数量。 (考虑子项 ID 始终对应于相同的子项名称)。 那就是:

id | name       | count
---+------------+------
2  | Subitem #2 | 5
3  | Subitem #3 | 6
1  | Subitem #1 | 1
5  | Subitem #5 | 1

这是原始的聚合查询:

GET /myindex/default/_search
{
  "size": 0,
  "aggregations": {
    "my_nested_agg": {
      "nested": {
        "path": "items.subitems"
      },
      "aggregations": {
        "subitem_id": {
          "terms": {
            "field": "items.subitems.id"
          },
          "aggregations": {
            "subitem_name": {
              "terms": {
                "field": "items.subitems.name"
              },
              "aggregations": {
                "my_rev_agg": {
                  "reverse_nested": {}
                }
              }
            }
          }
        }
      }
    }
  }
}

聚合似乎返回了我需要的所有数据:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 9,
    "max_score": 0.0,
    "hits": []
  },
  "aggregations": {
    "my_nested_agg": {
      "doc_count": 19,
      "subitem_id": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": 2,
            "doc_count": 11,
            "subitem_name": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "Subitem #2",
                  "doc_count": 11,
                  "my_rev_agg": {
                    "doc_count": 5
                  }
                }
              ]
            }
          },
          {
            "key": 3,
            "doc_count": 6,
            "subitem_name": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "Subitem #3",
                  "doc_count": 6,
                  "my_rev_agg": {
                    "doc_count": 6
                  }
                }
              ]
            }
          },
          {
            "key": 1,
            "doc_count": 1,
            "subitem_name": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "Subitem #1",
                  "doc_count": 1,
                  "my_rev_agg": {
                    "doc_count": 1
                  }
                }
              ]
            }
          },
          {
            "key": 5,
            "doc_count": 1,
            "subitem_name": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "Subitem #5",
                  "doc_count": 1,
                  "my_rev_agg": {
                    "doc_count": 1
                  }
                }
              ]
            }
          }
        ]
      }
    }
  }
}

但是,桶是根据“subitem_id”子聚合的 doc_count 降序排列的。

相反,我需要根据 reverse_nested 子聚合的 doc_count 降序排列存储桶。像这样:

id | name       | count
---+------------+------
3  | Subitem #3 | 6
2  | Subitem #2 | 5
1  | Subitem #1 | 1
5  | Subitem #5 | 1

我尝试通过以下查询来实现这一点:

GET /myindex/default/_search
{
  "size": 0,
  "aggregations": {
    "my_nested_agg": {
      "nested": {
        "path": "items.subitems"
      },
      "aggregations": {
        "subitem_id": {
          "terms": {
            "field": "items.subitems.id",
            "order": [
              {
                "subitem_name>my_rev_agg._count": "desc"
              }
            ]
          },
          "aggregations": {
            "subitem_name": {
              "terms": {
                "field": "items.subitems.name"
              },
              "aggregations": {
                "my_rev_agg": {
                  "reverse_nested": {}
                }
              }
            }
          }
        }
      }
    }
  }
}

然后我得到错误:

聚合订单路径无效 [subitem_name>my_rev_agg._count]。桶只能在子聚合器路径上排序,该路径由路径内的零个或多个单桶聚合和路径末端的最终单桶或指标聚合构建而成。子路径[subitem_name]指向非单桶聚合

请您给点建议。 非常感谢您。

【问题讨论】:

    标签: elasticsearch nested reverse aggregation


    【解决方案1】:

    我找到了一个符合我要求的解决方案。关键是将 reverse_nested 聚合移到用于检索名称的术语子聚合之外:

    GET /myindex/default/_search
    {
      "size": 0,
      "aggregations": {
        "my_nested_agg": {
          "nested": {
            "path": "items.subitems"
          },
          "aggregations": {
            "subitem_id": {
              "terms": {
                "field": "items.subitems.id",
                "order": [
                  {
                    "my_rev_agg": "desc"
                  }
                ]
              },
              "aggregations": {
                "subitem_name": {
                  "terms": {
                    "field": "items.subitems.name"
                  }
                },
                "my_rev_agg": {
                  "reverse_nested": {}
                }
              }
            }
          }
        }
      }
    }
    

    这会返回正确排序的子项桶:

    {
      "took": 0,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": 9,
        "max_score": 0.0,
        "hits": []
      },
      "aggregations": {
        "my_nested_agg": {
          "doc_count": 19,
          "subitem_id": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": 3,
                "doc_count": 6,
                "my_rev_agg": {
                  "doc_count": 6
                },
                "subitem_name": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
                      "key": "Subitem #3",
                      "doc_count": 6
                    }
                  ]
                }
              },
              {
                "key": 2,
                "doc_count": 11,
                "my_rev_agg": {
                  "doc_count": 5
                },
                "subitem_name": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
                      "key": "Subitem #2",
                      "doc_count": 11
                    }
                  ]
                }
              },
              {
                "key": 1,
                "doc_count": 1,
                "my_rev_agg": {
                  "doc_count": 1
                },
                "subitem_name": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
                      "key": "Subitem #1",
                      "doc_count": 1
                    }
                  ]
                }
              },
              {
                "key": 5,
                "doc_count": 1,
                "my_rev_agg": {
                  "doc_count": 1
                },
                "subitem_name": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
                      "key": "Subitem #5",
                      "doc_count": 1
                    }
                  ]
                }
              }
            ]
          }
        }
      }
    }
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2016-12-02
      • 1970-01-01
      • 1970-01-01
      • 2015-12-30
      • 2016-04-23
      • 2018-09-14
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多