ElasticSearch：从嵌套聚合查询中访问外部文档字段答案

【问题标题】：ElasticSearch: Access outer document fields from within an nested aggregated queryElasticSearch：从嵌套聚合查询中访问外部文档字段
【发布时间】：2016-04-01 05:16:23
【问题描述】：

我有以下映射：

{
    "dynamic": "strict",
    "properties": {
        "id": {
            "type": "string"
        },
        "title": {
            "type": "string"
        },
        "things": {
            "type": "nested",
            "properties": {
                "id": {
                    "type": "long"
                },
                "something": {
                    "type": "long"
                }
            }
        }
    }
}

我插入文档如下（Python 脚本）：

body = {"id": 1, "title": "one", "things": [{"id": 1000, "something": 33}, {"id": 1001, "something": 34}, ]}
es.create(index_name, doc_type=doc_type, body=body, id=1)

body = {"id": 2, "title": "two", "things": [{"id": 1000, "something": 43}, {"id": 1001, "something": 44}, ]}
es.create(index_name, doc_type=doc_type, body=body, id=2)

body = {"id": 3, "title": "three", "things": [{"id": 1000, "something": 53}, {"id": 1001, "something": 54}, ]}
es.create(index_name, doc_type=doc_type, body=body, id=3)

我运行以下聚合查询：

{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "things": {
      "aggs": {
        "num_articles": {
          "terms": {
            "field": "things.id",
            "size": 0
          },
          "aggs": {
            "articles": {
              "top_hits": {
                "size": 50
              }
            }
          }
        }
      },
      "nested": {
        "path": "things"
      }
    }
  },
  "size": 0
}

（所以，我想要计算每个“事物”出现的次数，并针对每个事物列出每个事物出现的文章列表）

查询产生：

"key": 1000,
"doc_count": 3,
"articles": {
    "hits": {
        "total": 3,
        "max_score": 1,
        "hits": [{
            "_index": "test",
            "_type": "article",
            "_id": "2",
            "_nested": {
                "field": "things",
                "offset": 0
            },
            "_score": 1,
            "_source": {
                "id": 1000,
                "something": 43
            }
        }, {
            "_index": "test",
            "_type": "article",
            "_id": "1",
            "_nested": {
                "field": "things",
                "offset": 0
            },
            "_score": 1,
            "_source": {
                "id": 1000,
                "something": 33
            }

....（等等）

我希望每次点击都列出“外部”或顶级文档中的所有字段，即在本例中为 id 和 title。

这真的可能吗.....如果可以的话怎么做???

【问题讨论】：

标签： elasticsearch pyelasticsearch

【解决方案1】：

我不确定这是否是您想要的，但让我们试一试：

{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "nested_things": {
      "nested": {
        "path": "things"
      },
      "aggs": {
        "num_articles": {
          "terms": {
            "field": "things.id",
            "size": 0
          },
          "aggs": {
            "articles": {
              "top_hits": {
                "size": 50
              }
            },
            "reverse_things": {
              "reverse_nested": {},
              "aggs": {
                "title": {
                  "terms": {
                    "field": "title",
                    "size": 0
                  }
                },
                "id": {
                  "terms": {
                    "field": "id",
                    "size": 0
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

这会产生这样的结果：

          "buckets": [
               {
                  "key": 1000,
                  "doc_count": 3,
                  "reverse_things": {
                     "doc_count": 3,
                     "id": {
                        "buckets": [
                           {
                              "key": "1",
                              "doc_count": 1
                           },
                           {
                              "key": "2",
                              "doc_count": 1
                           },
                           {
                              "key": "3",
                              "doc_count": 1
                           }
                        ]
                     },
                     "title": {
                        ...
                     }
                  },
                  "articles": {
                     "hits": {
                        "total": 3,
                        "max_score": 1,
                        "hits": [
                           {
                              "_index": "test",
                              "_type": "article",
                              "_id": "AVPOgQQjgDGxUAMojyuY",
                              "_nested": {
                                 "field": "things",
                                 "offset": 0
                              },
                              "_score": 1,
                              "_source": {
                                 "id": 1000,
                                 "something": 53
                              }
                           },
                           ...

【讨论】：

问题是reverse_things 部分列出了id 和title，但顺序不同。因此，ID 的键是 1,2,3 "id": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [{ "key": "1", "doc_count": 1 }, { "key": "2", "doc_count": 1 }, { "key": "3", "doc_count": 1 }] },
但是标题的键是一、三、二。 “title”：{“doc_count_error_upper_bound”：0，“sum_other_doc_count”：0，“buckets”：[{“key”：“one”，“doc_count”：1}，{“key”：“three”，“doc_count” : 1 }, { "key": "two", "doc_count": 1 }] } 如果可以强制排序与原始文章匹配，那将起作用。顺便谢谢@kristian-ferkić ...