【问题标题】:Elasticsearch request弹性搜索请求
【发布时间】:2020-11-20 02:01:00
【问题描述】:

我想使用 Elasticsearch-dsl 或 Elasticsearch 执行以下请求。

Select all users with the same name but different ages

示例:

索引数据:

 {  "name": "name1","age": 20  }
 {  "name": "name2","age": 23  } 
 {  "name": "name3","age": 20  }
 {  "name": "name1","age": 22  }
 {  "name": "name4","age": 18  }
 {  "name": "name2","age": 23  }
 {  "name": "name4","age": 18  }
 {  "name": "name4","age": 14  }

我想要这样的结果

结果:

 {  "name": "name4","age": 18 ,"age": 14  }
 {  "name": "name1","age": 22 ,"age": 20  }

【问题讨论】:

    标签: python elasticsearch request elasticsearch-dsl


    【解决方案1】:

    您需要申请nested aggregations。既然你来自python,就跟着python脚本走吧:

    from elasticsearch import Elasticsearch
    
    # Connect to the elastic cluster
    es=Elasticsearch([{'host':'localhost','port':9200}])
    
    your_data = [
         {  "name": "name1","age": 20  },
         {  "name": "name2","age": 23  },
         {  "name": "name3","age": 20  },
         {  "name": "name1","age": 22  },
         {  "name": "name4","age": 18  },
         {  "name": "name2","age": 23  },
         {  "name": "name4","age": 18  },
         {  "name": "name4","age": 14  }
    ]
    
    your_index_name = "test_index"
    
    # indexing your exemple data
    for doc in your_data:
        es.index(index=your_index_name, body=doc)
    
    

    首先,您需要为每个名称创建文档存储桶,我称之为“buckets_for_name”,然后在 buckets_for_name 内部应用年龄的嵌套术语聚合:

    # the nested aggregation query 
    query = {
      "aggs": {
        "buckets_for_name": {
          "terms": { "field": "name.keyword" },
           "aggs": {
              "age_terms": {
                "terms": {
                  "field": "age"
                }
              }
            }
        }
      }
    }
    
    res = es.search(index=your_index_name, body=query)
    
    # the results are here
    res["aggregations"]["buckets_for_name"]["buckets"]
    

    结果并不如你所愿:

    [{'key': 'name4',
      'doc_count': 3,
      'age_terms': {'doc_count_error_upper_bound': 0,
       'sum_other_doc_count': 0,
       'buckets': [{'key': 18, 'doc_count': 2}, {'key': 14, 'doc_count': 1}]}},
     {'key': 'name1',
      'doc_count': 2,
      'age_terms': {'doc_count_error_upper_bound': 0,
       'sum_other_doc_count': 0,
       'buckets': [{'key': 20, 'doc_count': 1}, {'key': 22, 'doc_count': 1}]}},
     {'key': 'name2',
      'doc_count': 2,
      'age_terms': {'doc_count_error_upper_bound': 0,
       'sum_other_doc_count': 0,
       'buckets': [{'key': 23, 'doc_count': 2}]}},
     {'key': 'name3',
      'doc_count': 1,
      'age_terms': {'doc_count_error_upper_bound': 0,
       'sum_other_doc_count': 0,
       'buckets': [{'key': 20, 'doc_count': 1}]}}]
    

    所以清洁它。这里有一个建议:

    pretty_results = []
    for result in res["aggregations"]["buckets_for_name"]["buckets"]:
        d = dict()
        d["name"] = result["key"]
        d["ages"] = []
        for age in result["age_terms"]["buckets"]:
            d["ages"].append(age["key"])
        pretty_results.append(d)
    
    

    漂亮的输出:

    [{'name': 'name4', 'ages': [18, 14]},
     {'name': 'name1', 'ages': [20, 22]},
     {'name': 'name2', 'ages': [23]},
     {'name': 'name3', 'ages': [20]}]
    

    【讨论】:

      【解决方案2】:

      还有另一种解决此问题的方法,即聚合名称,然后仅选择具有不同最小/最大年龄的名称存储桶:

      POST test/_search
      {
        "size": 0,
        "aggs": {
          "names": {
            "terms": {
              "field": "name.keyword",
              "size": 10,
              "min_doc_count": 2
            },
            "aggs": {
              "min_age": {
                "min": {
                  "field": "age"
                }
              },
              "max_age": {
                "max": {
                  "field": "age"
                }
              },
              "all_ages": {
                "terms": {
                  "field": "age",
                  "size": 10
                }
              },
              "diff_ages": {
                "bucket_selector": {
                  "buckets_path": {
                    "min": "min_age",
                    "max": "max_age"
                  },
                  "script": "params.min != params.max"
                }
              }
            }
          }
        }
      }
      

      响应:您只会得到名称 name1name4,因为 name2 具有相同的最小/最大年龄。

        "buckets" : [
          {
            "key" : "name4",
            "doc_count" : 3,
            "max_age" : {
              "value" : 18.0
            },
            "all_ages" : {
              "doc_count_error_upper_bound" : 0,
              "sum_other_doc_count" : 0,
              "buckets" : [
                {
                  "key" : 18,
                  "doc_count" : 2
                },
                {
                  "key" : 14,
                  "doc_count" : 1
                }
              ]
            },
            "min_age" : {
              "value" : 14.0
            }
          },
          {
            "key" : "name1",
            "doc_count" : 2,
            "max_age" : {
              "value" : 22.0
            },
            "all_ages" : {
              "doc_count_error_upper_bound" : 0,
              "sum_other_doc_count" : 0,
              "buckets" : [
                {
                  "key" : 20,
                  "doc_count" : 1
                },
                {
                  "key" : 22,
                  "doc_count" : 1
                }
              ]
            },
            "min_age" : {
              "value" : 20.0
            }
          }
        ]
      

      【讨论】:

      • 您的解决方案仅返回两个值(最大值和最小值),如果用户名相同且年龄为 20 岁,则不起作用
      • 祝你好运
      【解决方案3】:

      不特定于 Python,您需要的是年龄上的术语聚合,其中名称是特定值:

      GET /_search
      {
        "query" : {
           "bool" : {
              "should" : { "match" : { "name" : "name1"} }
           }
        },
        "aggs": {
          "ages_for_name": {
            "terms": { "field": "age" } 
          }
        }
      }
      

      并为“name1”和“name4”运行此命令以获取“ages_for_name”存储桶,只需使用键(存储桶名称)并忽略存储桶值。

      【讨论】:

        猜你喜欢
        • 2012-11-29
        • 1970-01-01
        • 2018-03-10
        • 1970-01-01
        • 1970-01-01
        • 2019-03-07
        • 1970-01-01
        • 2015-05-26
        • 2018-12-19
        相关资源
        最近更新 更多