【问题标题】:ElasticSearch - Combine filters & Composite Query to get unique fields combinationsElasticSearch - 结合过滤器和复合查询以获得独特的字段组合
【发布时间】:2021-07-20 12:44:01
【问题描述】:

嗯..我对 ES 非常“新手”,所以关于聚合...字典中没有任何词可以描述我的水平:p

今天我面临一个问题,我正在尝试创建一个查询,该查询应该执行类似于 SQL DISTINCT 的内容,但在过滤器之间。我有这个文档(当然是对真实情况的抽象):

{
  "id": "1",
  "createdAt": 1626783747,
  "updatedAt": 1626783747,
  "isAvailable": true,
  "kind": "document",
  "classification": {
    "id": 1,
    "name": "a_name_for_id_1"
  },
  "structure": {
    "material": "cartoon",
    "thickness": 5
  },
  "shared": true,
  "objective": "stackoverflow"
}

由于上述文档的所有数据都可能有所不同,因此我有一些可能是多余的值,例如classification.idkindstructure.material

因此,为了满足我的要求,我想对这 3 个字段进行“分组”,以便对每个字段进行独特的组合。如果我们再深入一点,通过以下数据,我应该得到以下可能性:

[{
        "id": "1",
        "createdAt": 1626783747,
        "updatedAt": 1626783747,
        "isAvailable": true,
        "kind": "document",
        "classification": {
            "id": 1,
            "name": "a_name_for_id_1"
        },
        "structure": {
            "material": "cartoon",
            "thickness": 5
        },
        "shared": true,
        "objective": "stackoverflow"
    },
    {
        "id": "2",
        "createdAt": 1626783747,
        "updatedAt": 1626783747,
        "isAvailable": true,
        "kind": "document",
        "classification": {
            "id": 2,
            "name": "a_name_for_id_2"
        },
        "structure": {
            "material": "iron",
            "thickness": 3
        },
        "shared": true,
        "objective": "linkedin"
    },
    {
        "id": "3",
        "createdAt": 1626783747,
        "updatedAt": 1626783747,
        "isAvailable": false,
        "kind": "document",
        "classification": {
            "id": 2,
            "name": "a_name_for_id_2"
        },
        "structure": {
            "material": "paper",
            "thickness": 1
        },
        "shared": false,
        "objective": "tiktok"
    },
    {
        "id": "4",
        "createdAt": 1626783747,
        "updatedAt": 1626783747,
        "isAvailable": true,
        "kind": "document",
        "classification": {
            "id": 3,
            "name": "a_name_for_id_3"
        },
        "structure": {
            "material": "cartoon",
            "thickness": 5
        },
        "shared": false,
        "objective": "snapchat"
    },
    {
        "id": "5",
        "createdAt": 1626783747,
        "updatedAt": 1626783747,
        "isAvailable": true,
        "kind": "document",
        "classification": {
            "id": 3,
            "name": "a_name_for_id_3"
        },
        "structure": {
            "material": "paper",
            "thickness": 1
        },
        "shared": true,
        "objective": "twitter"
    },
    {
        "id": "6",
        "createdAt": 1626783747,
        "updatedAt": 1626783747,
        "isAvailable": false,
        "kind": "document",
        "classification": {
            "id": 3,
            "name": "a_name_for_id_3"
        },
        "structure": {
            "material": "iron",
            "thickness": 3
        },
        "shared": true,
        "objective": "facebook"
    }
]

基于上述,我应该在“buckets”中得到以下结果:

  • 记录 1 幅漫画
  • 文件 2 铁
  • 文档 2 纸
  • 文档 3 卡通
  • 文档 3 纸
  • 文件 3 铁

当然,为了这个例子(为了方便起见,我还没有任何重复)

但是,除此之外,我只需要一些“预过滤器”:

  • 可用的文档isAvailable=true
  • 文档结构的厚度应介于 2 和 4 之间,包括:2 >= structure.thickness >= 4
  • 共享的文档shared=true

与第一组结果相比,我应该只得到以下组合:

  • 文件 1 动画片 -> not a valid result, thickness > 4
  • 文件 2 铁
  • 文档 2 论文 -> not a valid result, isAvailable != true
  • 文档 3 卡通 -> not a valid result, thickness > 4
  • 文档 3 卡通 -> not a valid result, thickness < 2
  • 文件 3 铁 -> not a valid result, isAvailable != true

如果您还在阅读,那么……谢谢! xD

因此,如您所见,我需要与静态模式 kind <> classification_id <> structure_material 相关的该字段的所有可能组合,这些组合与 isAvailable, thickness, shared 相关的过滤器匹配。

关于输出,点击对我来说并不重要,因为我不需要文档,而只需要组合 kind <> classification_id <> structure_material :)

感谢您的帮助:)

最大

【问题讨论】:

    标签: elasticsearch elasticsearch-aggregation


    【解决方案1】:

    您可以使用现有过滤器获得 Cardinatily 聚合。请检查此网址,如果您有任何疑问,请告诉我。 https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html

    【讨论】:

    • 您好,感谢您的回答。我尝试了你的方法但没有成功.. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default
    【解决方案2】:

    多亏了一位同事,我终于可以让它按预期工作了!

    查询

    GET index-latest/_search
    {
       "size": 0,
       "query": {
          "bool": {
             "filter": [
                {
                   "term": {
                      "isAvailable": true
                   }
                },
                {
                   "range": {
                      "structure.thickness": {
                         "gte": 2,
                         "lte": 4
                      }
                   }
                },
                {
                   "term": {
                      "shared": true
                   }
                }
             ]
          }
       },
       "aggs": {
          "my_agg_example": {
             "composite": {
                "size": 10,
                "sources": [
                   {
                      "kind": {
                         "terms": {
                            "field": "kind.keyword",
                            "order": "asc"
                         }
                      }
                   },
                   {
                      "classification_id": {
                         "terms": {
                            "field": "classification.id",
                            "order": "asc"
                         }
                      }
                   },
                   {
                      "structure_material": {
                         "terms": {
                            "field": "structure.material.keyword",
                            "order": "asc"
                         }
                      }
                   }
                ]
             }
          }
       }
    }
    

    那么给定的结果是:

    {
       "took": 11,
       "timed_out": false,
       "_shards": {
          "total": 1,
          "successful": 1,
          "skipped": 0,
          "failed": 0
       },
       "hits": {
          "total": {
             "value": 1,
             "relation": "eq"
          },
          "max_score": null,
          "hits": []
       },
       "aggregations": {
          "my_agg_example": {
             "after_key": {
                "kind": "document",
                "classification_id": 2,
                "structure_material": "iron"
             },
             "buckets": [
                {
                   "key": {
                      "kind": "document",
                      "classification_id": 2,
                      "structure_material": "iron"
                   },
                   "doc_count": 1
                }
             ]
          }
       }
    }
    

    所以,如我们所见,我们得到以下存储桶:

    {
        "key": {
            "kind": "document",
            "classification_id": 2,
            "structure_material": "iron"
        },
        "doc_count": 1
    }
    

    注意:请注意您的字段类型。将.keyword 放在 classification.id 上会导致桶中没有结果...@987654325 @ 应该只用于字符串等类型(据我了解,如果我错了,请纠正我)

    正如预期的那样,我们得到了以下结果(与最初的问题相比):

    • 文件 2 铁

    注意:请注意,aggs.<name>.composite.sources 中元素的顺序确实会影响返回的结果。

    谢谢!

    【讨论】:

      猜你喜欢
      • 2020-12-02
      • 2015-02-25
      • 2021-08-08
      • 2020-12-21
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多