【问题标题】:Nested filter in Elasticsearch aggregation queryElasticsearch 聚合查询中的嵌套过滤器
【发布时间】:2020-01-10 22:55:32
【问题描述】:

我正在使用嵌套过滤器运行以下聚合查询

GET <indexname>/_search
{
  "aggs": {
    "NAME": {
      "nested": {
        "path": "crm.LeadStatusHistory"
      },
      "aggs": {
        "agg_filter": {
          "filter": {
            "bool": {
              "must": [
                {
                  "nested": {
                    "path": "crm",
                    "query": {
                      "terms": {
                        "crm.City.keyword": [
                          "Rewa"
                        ]
                      }
                    }
                  }
                },
                {
                  "nested": {
                    "path": "crm",
                    "query": {
                      "terms": {
                        "crm.LeadID": [
                          27961
                        ]
                      }
                    }
                  }
                }
              ]
            }
          },
          "aggs": {
            "agg_terms":{
              "terms": {
                "field": "crm.LeadStatusHistory.StatusID",
                "size": 1000
              }
            }
          }
        }
      }
    }
  }
}

-----> 我有以下文件

{
        "_index" : "crm",
        "_type" : "_doc",
        "_id" : "4478",
        "_score" : 1.0,
        "_source" : {
          "crm" : [
            {
              "LeadStatusHistory" : [
                {
                  "StatusID" : 3
                },
                {
                  "StatusID" : 2
                },
                {
                  "StatusID" : 1
                }
              ],
              "LeadID" : 27961,
              "City" : "Rewa"
            },
            {
              "LeadStatusHistory" : [
                {
                  "StatusID" : 1
                },
                {
                  "StatusID" : 3
                },
                {
                  "StatusID" : 2
                }
              ],
              "LeadID" : 27959,
              "City" : "Rewa"
            }
          ]
        }
      }]

但是作为回应,我得到了以下结果

"aggregations" : {
    "NAME" : {
      "doc_count" : 4332,
      "agg_filter" : {
        "doc_count" : 1,
        "agg_terms" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [
            {
              "key" : 1,
              "doc_count" : 1
            }
          ]
        }
      }
    }
  }

Query===> 根据源文档,我有 3 个嵌套的“crm.LeadStatusHistory”文档,用于 crm.LeadID = 27961。但是,结果显示 agg_filter 等于 1 而不是 3。请告诉我这种情况的原因。

【问题讨论】:

  • 请分享您的映射。
  • 嗨 LeBigCat,请让我们知道具体问题,因为无法共享映射。我已共享源文档供您参考

标签: elasticsearch elasticsearch-aggregation


【解决方案1】:

您的 agg_filter 在 crm.LeadStatusHistory => 上仅针对 1 个文档(LeadStatusHistory 是一个文档,在您的案例中包含指向其他文档的链接)。

我建立了一个查询来显示这一点,我想会回答你的问题。您将看到每个聚合的不同 doc_count。

{
  "size": 0,
  "aggs": {
    "NAME": {
      "nested": {
        "path": "crm"
      },
      "aggs": {
        "agg_LeadID": {
          "terms": {
            "field": "crm.LeadID"
          },
          "aggs": {
            "agg_LeadStatusHistory": {
              "nested": {
                "path": "crm.LeadStatusHistory"
              },
              "aggs": {
                "home_type_name": {
                  "terms": {
                    "field": "crm.LeadStatusHistory.StatusID"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

你可以用这个来计算它们,用一个脚本(如果需要的话可以过滤):

{
  "size": 0,
  "aggs": {
    "NAME": {
      "nested": {
        "path": "crm"
      },
      "aggs": {
        "agg_LeadID": {
          "terms": {
            "field": "crm.LeadID"
          },
          "aggs": {
            "agg_LeadStatusHistory": {
              "nested": {
                "path": "crm.LeadStatusHistory"
              },
              "aggs": {
                "agg_LeadStatusHistory_sum": {
                  "sum": {
                    "script": "doc['crm.LeadStatusHistory.StatusID'].values.length"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

注意:如果要获取嵌套文档的数量,请查看 inner_hits: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-inner-hits

【讨论】:

    【解决方案2】:

    我与“crm.LeadStatusHistory”中的响应是一个文档不同。我在没有过滤器的情况下对 crm.LeadstatusHistory 运行了聚合查询。

    GET crm/_search
    {
      "_source": ["crm.LeadID","crm.LeadStatusHistory.StatusID","crm.City"], 
      "size": 10000,
      "query": {
        "nested": {
          "path": "crm",
          "query": {
            "match": {
              "crm.LeadID": "27961"
            }
          }
        }
      }, 
      "aggs": {
        "agg_statuscount": {
          "nested": {
            "path": "crm.LeadStatusHistory"
          },
              "aggs": {
                "agg_terms":{
                  "terms": {
                    "field": "crm.LeadStatusHistory.StatusID",
                    "size": 1000
                  }
                }
              }
            }
          }
        }
    

    我从上述查询中得到以下响应,其中显示“agg_statuscount”为 6 个没有过滤器的文档

    {
      "took" : 6,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 1,
          "relation" : "eq"
        },
        "max_score" : 1.0,
        "hits" : [
          {
            "_index" : "crm",
            "_type" : "_doc",
            "_id" : "4478",
            "_score" : 1.0,
            "_source" : {
              "crm" : [
                {
                  "LeadStatusHistory" : [
                    {
                      "StatusID" : 3
                    },
                    {
                      "StatusID" : 2
                    },
                    {
                      "StatusID" : 1
                    }
                  ],
                  "LeadID" : 27961,
                  "City" : "Rewa"
                },
                {
                  "LeadStatusHistory" : [
                    {
                      "StatusID" : 1
                    },
                    {
                      "StatusID" : 3
                    },
                    {
                      "StatusID" : 2
                    }
                  ],
                  "LeadID" : 27959,
                  "City" : "Rewa"
                }
              ]
            }
          }
        ]
      },
      "aggregations" : {
        "agg_statuscount" : {
          "doc_count" : 6,
          "agg_terms" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : 1,
                "doc_count" : 2
              },
              {
                "key" : 2,
                "doc_count" : 2
              },
              {
                "key" : 3,
                "doc_count" : 2
              }
            ]
          }
        }
      }
    }
    

    因此,在聚合过滤器中使用 crm.LeadID = 27961,我预计 3 个“crm.LeadStatusHistory”文档。目前响​​应是 1,就像我原来的问题一样。

    【讨论】:

      猜你喜欢
      • 2020-09-05
      • 2018-01-30
      • 2014-12-31
      • 2014-11-14
      • 1970-01-01
      • 2015-11-14
      • 2021-04-19
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多