【问题标题】:ElasticSearch: Access outer document fields from within an nested aggregated queryElasticSearch:从嵌套聚合查询中访问外部文档字段
【发布时间】:2016-04-01 05:16:23
【问题描述】:

我有以下映射:

{
    "dynamic": "strict",
    "properties": {
        "id": {
            "type": "string"
        },
        "title": {
            "type": "string"
        },
        "things": {
            "type": "nested",
            "properties": {
                "id": {
                    "type": "long"
                },
                "something": {
                    "type": "long"
                }
            }
        }
    }
}

我插入文档如下(Python 脚本):

body = {"id": 1, "title": "one", "things": [{"id": 1000, "something": 33}, {"id": 1001, "something": 34}, ]}
es.create(index_name, doc_type=doc_type, body=body, id=1)

body = {"id": 2, "title": "two", "things": [{"id": 1000, "something": 43}, {"id": 1001, "something": 44}, ]}
es.create(index_name, doc_type=doc_type, body=body, id=2)

body = {"id": 3, "title": "three", "things": [{"id": 1000, "something": 53}, {"id": 1001, "something": 54}, ]}
es.create(index_name, doc_type=doc_type, body=body, id=3)

我运行以下聚合查询:

{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "things": {
      "aggs": {
        "num_articles": {
          "terms": {
            "field": "things.id",
            "size": 0
          },
          "aggs": {
            "articles": {
              "top_hits": {
                "size": 50
              }
            }
          }
        }
      },
      "nested": {
        "path": "things"
      }
    }
  },
  "size": 0
}

(所以,我想要计算每个“事物”出现的次数,并针对每个事物列出每个事物出现的文章列表)

查询产生:

"key": 1000,
"doc_count": 3,
"articles": {
    "hits": {
        "total": 3,
        "max_score": 1,
        "hits": [{
            "_index": "test",
            "_type": "article",
            "_id": "2",
            "_nested": {
                "field": "things",
                "offset": 0
            },
            "_score": 1,
            "_source": {
                "id": 1000,
                "something": 43
            }
        }, {
            "_index": "test",
            "_type": "article",
            "_id": "1",
            "_nested": {
                "field": "things",
                "offset": 0
            },
            "_score": 1,
            "_source": {
                "id": 1000,
                "something": 33
            }

....(等等)

我希望每次点击都列出“外部”或顶级文档中的所有字段,即在本例中为 id 和 title。

这真的可能吗.....如果可以的话怎么做???

【问题讨论】:

    标签: elasticsearch pyelasticsearch


    【解决方案1】:

    我不确定这是否是您想要的,但让我们试一试:

    {
      "query": {
        "match_all": {}
      },
      "aggs": {
        "nested_things": {
          "nested": {
            "path": "things"
          },
          "aggs": {
            "num_articles": {
              "terms": {
                "field": "things.id",
                "size": 0
              },
              "aggs": {
                "articles": {
                  "top_hits": {
                    "size": 50
                  }
                },
                "reverse_things": {
                  "reverse_nested": {},
                  "aggs": {
                    "title": {
                      "terms": {
                        "field": "title",
                        "size": 0
                      }
                    },
                    "id": {
                      "terms": {
                        "field": "id",
                        "size": 0
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
    

    这会产生这样的结果:

              "buckets": [
                   {
                      "key": 1000,
                      "doc_count": 3,
                      "reverse_things": {
                         "doc_count": 3,
                         "id": {
                            "buckets": [
                               {
                                  "key": "1",
                                  "doc_count": 1
                               },
                               {
                                  "key": "2",
                                  "doc_count": 1
                               },
                               {
                                  "key": "3",
                                  "doc_count": 1
                               }
                            ]
                         },
                         "title": {
                            ...
                         }
                      },
                      "articles": {
                         "hits": {
                            "total": 3,
                            "max_score": 1,
                            "hits": [
                               {
                                  "_index": "test",
                                  "_type": "article",
                                  "_id": "AVPOgQQjgDGxUAMojyuY",
                                  "_nested": {
                                     "field": "things",
                                     "offset": 0
                                  },
                                  "_score": 1,
                                  "_source": {
                                     "id": 1000,
                                     "something": 53
                                  }
                               },
                               ...
    

    【讨论】:

    • 问题是reverse_things 部分列出了id 和title,但顺序不同。因此,ID 的键是 1,2,3 "id": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [{ "key": "1", "doc_count": 1 }, { "key": "2", "doc_count": 1 }, { "key": "3", "doc_count": 1 }] },
    • 但是标题的键是一、三、二。 “title”:{“doc_count_error_upper_bound”:0,“sum_other_doc_count”:0,“buckets”:[{“key”:“one”,“doc_count”:1},{“key”:“three”,“doc_count” : 1 }, { "key": "two", "doc_count": 1 }] } 如果可以强制排序与原始文章匹配,那将起作用。顺便谢谢@kristian-ferkić ...
    猜你喜欢
    • 2020-12-21
    • 2016-01-07
    • 2015-07-30
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-12-15
    相关资源
    最近更新 更多