【问题标题】:Elasticsearch, Nested AggregationsElasticsearch,嵌套聚合
【发布时间】:2015-10-07 23:50:24
【问题描述】:

我正在编写动态查询生成,它允许通过映射中的任何字段组合进行聚合。正如下面的映射(截断),有嵌套类型的字段。例如按 [activities.activity,duration] 或 [activities.activity, activities.duration] 或 [applicationName, duration] 聚合

映射:

{
nested: {
    properties: {
        @timestamp: {
            type: "date",
            format: "dateOptionalTime"
        },
        activities: {
            type: "nested",
            include_in_parent: true,
            properties: {
                activity: {
                    type: "string",
                    index: "not_analyzed"
                },
                duration: {
                    type: "long"
                },
                entry: {
                    properties: {
                        blockName: {
                            type: "string",
                            index: "not_analyzed"
                        },
                        blockid: {
                            type: "string"
                        },
                        time: {
                            type: "date",
                            format: "dateOptionalTime"
                        }
                    }
                },
                exit: {
                    properties: {
                        blockName: {
                            type: "string",
                            index: "not_analyzed"
                        },
                        blockid: {
                            type: "string"
                        },
                        time: {
                            type: "date",
                            format: "dateOptionalTime"
                        }
                    }
                },
                seq: {
                    type: "integer"
                }
            }
        },
        applicationName: {
            type: "string",
            index: "not_analyzed"
        },
        duration: {
            type: "long"
        }
    }
}}

示例文档:

{
"@timestamp": "2015-09-15T17:35:24.020Z",
"duration": "37616",
"applicationName": "my application name",
"activities": [{
    "duration": "20362",
    "entry": {
        "blockid": "2",
        "time": "2015-09-15T17:35:24.493Z",
        "blockName": "My Self Service"
    },
    "exit": {
        "blockid": "2",
        "time": "2015-09-15T17:35:44.855Z",
        "blockName": "My Self Service"
    },
    "seq": 1,
    "activity": "Prompter v2.3"
}, {
    "duration": "96",
    "entry": {
        "blockid": "2",
        "time": "2015-09-15T17:35:45.268Z",
        "blockName": "My Self Service"
    },
    "exit": {
        "blockid": "2",
        "time": "2015-09-15T17:35:45.364Z",
        "blockName": "My Self Service"
    },
    "seq": 2,
    "activity": "Start v2.5"
}, {
    "duration": "15931",
    "entry": {
        "blockid": "2",
        "time": "2015-09-15T17:35:45.669Z",
        "blockName": "My Self Service"
    },
    "exit": {
        "blockid": "2",
        "time": "2015-09-15T17:36:01.600Z",
        "blockName": "My Self Service"
    },
    "seq": 3,
    "activity": "System v2.3"
}]}

示例查询:

{
"size": 0,
"aggs": {
    "dim0": {
        "nested" : {
            "path": "activities"
        },
        "aggs": {
            "dim1": {
                "terms": {
                    "field": "activities.activity"
                },
                "aggs": {
                    "dim_reverse":{
                        "reverse_nested":{},
                        "aggs":{
                            "avg_duration": {
                                "avg": {
                                    "field": "duration"
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}}

问题, 正如您在查询中看到的那样,在对嵌套字段下的根级别字段进行平均时。必须包含 reverse_nested,以便可以看到根级别字段“duration”。这意味着在生成查询时,我们需要检查字段组合,看看父/子字段是否是字段嵌套、嵌套在同一路径下或根级别的情况,然后生成正确的查询。当聚合更多字段时,这可能会更复杂,例如,按 [applicationName, activities.duration, duration,activities.activity] 聚合。有谁知道更优雅的方式来做到这一点?如果我们可以指定绝对路径,逻辑可能会更简单

【问题讨论】:

    标签: elasticsearch nested aggregation


    【解决方案1】:

    不是我的问题的真正答案,而是添加更多示例,因为它可以帮助其他人更好地理解嵌套聚合。

          aggs field  average field 
    case1 yes         yes
    case2 yes         no
    case3 no          yes
    case4 no          no
    

    yes->嵌套类型,no->非嵌套类型

    Case1 路径相同

    查询

    {
    "size": 0,
    "aggs": {
        "dim0": {
            "nested" : {
                "path": "activities"
            },
            "aggs": {
                "dim1": {
                    "terms": {
                        "field": "activities.activity"
                    },
                    "aggs":{
                        "avg_duration": {
                            "avg": {
                                "field": "activities.duration"
                            }
                        }
                    }
                }
            }
        }
    }}
    

    结果:

    {
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.0,
        "hits": []
    },
    "aggregations": {
        "dim0": {
            "doc_count": 3,
            "dim1": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [{
                    "key": "Prompter v2.3",
                    "doc_count": 1,
                    "avg_duration": {
                        "value": 20362.0
                    }
                }, {
                    "key": "Start v2.5",
                    "doc_count": 1,
                    "avg_duration": {
                        "value": 96.0
                    }
                }, {
                    "key": "System v2.3",
                    "doc_count": 1,
                    "avg_duration": {
                        "value": 15931.0
                    }
                }]
            }
        }
    }}
    

    case1,两个字段都是嵌套的,但是 reverse_nested 以在所有“activities.duration”上具有相同的平均值

    查询

    {
    "size": 0,
    "aggs": {
        "dim0": {
            "nested" : {
                "path": "activities"
            },
            "aggs": {
                "dim1": {
                    "terms": {
                        "field": "activities.activity"
                    },
                    "aggs": {
                        "dim_reverse1":{
                            "reverse_nested":{
                            },
                            "aggs":{
                                "avg_duration": {
                                    "avg": {
                                        "field": "activities.duration"
                                    }
                                }
                            }
                        }
                    }
                }                
            }
        }
    }}
    

    结果

    {
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.0,
        "hits": []
    },
    "aggregations": {
        "dim0": {
            "doc_count": 3,
            "dim1": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [{
                    "key": "Prompter v2.3",
                    "doc_count": 1,
                    "dim_reverse1": {
                        "doc_count": 1,
                        "avg_duration": {
                            "value": 12129.666666666666
                        }
                    }
                }, {
                    "key": "Start v2.5",
                    "doc_count": 1,
                    "dim_reverse1": {
                        "doc_count": 1,
                        "avg_duration": {
                            "value": 12129.666666666666
                        }
                    }
                }, {
                    "key": "System v2.3",
                    "doc_count": 1,
                    "dim_reverse1": {
                        "doc_count": 1,
                        "avg_duration": {
                            "value": 12129.666666666666
                        }
                    }
                }]
            }
        }
    }}
    

    案例3

    查询

    {
    "size": 0,
    "aggs": {
        "dim1": {
            "terms": {
                "field": "applicationName"
            },
            "aggs":{
                "avg_duration": {
                    "avg": {
                        "field": "activities.duration"
                    }
                }
            }
        }
    }}
    

    结果

    {
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.0,
        "hits": []
    },
    "aggregations": {
        "dim1": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [{
                "key": "my application name",
                "doc_count": 1,
                "avg_duration": {
                    "value": 12129.666666666666
                }
            }]
        }
    }}
    

    Case2 包括reserver_nested 以返回到根级别

    查询

    {
    "size": 0,
    "aggs": {
        "dim0": {
            "nested" : {
                "path": "activities"
            },
            "aggs": {
                "dim1": {
                    "terms": {
                        "field": "activities.activity"
                    },
                    "aggs": {
                        "dim_reverse":{
                            "reverse_nested":{},
                            "aggs":{
                                "avg_duration": {
                                    "avg": {
                                        "field": "duration"
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }}
    

    结果:

    {
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.0,
        "hits": []
    },
    "aggregations": {
        "dim0": {
            "doc_count": 3,
            "dim1": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [{
                    "key": "Prompter v2.3",
                    "doc_count": 1,
                    "dim_reverse": {
                        "doc_count": 1,
                        "avg_duration": {
                            "value": 37616.0
                        }
                    }
                }, {
                    "key": "Start v2.5",
                    "doc_count": 1,
                    "dim_reverse": {
                        "doc_count": 1,
                        "avg_duration": {
                            "value": 37616.0
                        }
                    }
                }, {
                    "key": "System v2.3",
                    "doc_count": 1,
                    "dim_reverse": {
                        "doc_count": 1,
                        "avg_duration": {
                            "value": 37616.0
                        }
                    }
                }]
            }
        }
    }}
    

    案例2,不指定嵌套路径

    查询

    {
    "size": 0,
    "aggs": {
        "dim1": {
            "terms": {
                "field": "activities.activity"
            },
            "aggs":{
                "avg_duration": {
                    "avg": {
                        "field": "duration"
                    }
                }
            }
        }
    }}
    

    结果结果与上一个相同

    {
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.0,
        "hits": []
    },
    "aggregations": {
        "dim1": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [{
                "key": "Prompter v2.3",
                "doc_count": 1,
                "avg_duration": {
                    "value": 37616.0
                }
            }, {
                "key": "Start v2.5",
                "doc_count": 1,
                "avg_duration": {
                    "value": 37616.0
                }
            }, {
                "key": "System v2.3",
                "doc_count": 1,
                "avg_duration": {
                    "value": 37616.0
                }
            }]
        }
    }
    

    }

    案例2,没有指定reserver_nested,根级别的“duration”是看不到的

    查询

    {
    "size": 0,
    "aggs": {
        "dim0": {
            "nested" : {
                "path": "activities"
            },
            "aggs": {
                "dim1": {
                    "terms": {
                        "field": "activities.activity"
                    },
                    "aggs":{
                        "avg_duration": {
                            "avg": {
                                "field": "duration"
                            }
                        }
                    }
                }
            }
        }
    }}
    

    结果

    {
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.0,
        "hits": []
    },
    "aggregations": {
        "dim0": {
            "doc_count": 3,
            "dim1": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [{
                    "key": "Prompter v2.3",
                    "doc_count": 1,
                    "avg_duration": {
                        "value": null
                    }
                }, {
                    "key": "Start v2.5",
                    "doc_count": 1,
                    "avg_duration": {
                        "value": null
                    }
                }, {
                    "key": "System v2.3",
                    "doc_count": 1,
                    "avg_duration": {
                        "value": null
                    }
                }]
            }
        }
    }}
    

    【讨论】:

      猜你喜欢
      • 2021-12-14
      • 2015-11-18
      • 2018-01-30
      • 2015-07-30
      • 2015-03-25
      • 2014-12-31
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多