【问题标题】:Count and sort by the number of occurences in an array按数组中出现的次数进行计数和排序
【发布时间】:2020-09-11 02:00:22
【问题描述】:

我有一个名为 account 的类型,其映射如下:

        "country" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "followingClientIds" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          },
          "fielddata" : true
        },

followingClientIds 是我关注的其他帐户的字符串 ID 数组。

我想构建一个查询,从一个国家/地区获取每个帐户,并按照我们都关注的共同帐户数对它们进行排序。

以下是我到目前为止所做的一些查询:


GET account/_search
{
  "size": 20,
  "query": {
    "bool": {
      "filter": {
        "term": {
          "country.keyword": "AT"
        }
      }
    }
  },
  "sort": [
    {
      "followingClientIds.keyword": {
        "order": "asc",
        "nested_filter": {
          "terms": {
            "followingClientIds.keyword": [
              "dFbEW23hVZ3w8jhH9LeCw3QG33UjuF5C"
            ]
          }
        }
      }
    }
  ]
}

例如,我有这 3 个帐户类型的文档:

{
    "username": "user2",
    "country": "AT",
    "followingClientIds": ["abc"]
},
{
    "username": "user3",
    "country": "AT",
    "followingClientIds": ["abc", "bcd", "cde"]
},
{
    "username": "user4",
    "country": "AT",
    "followingClientIds": ["abc"]
}

假设我将向查询发送 countryfollowingClientIds 以进行排序:

{
    "country": "AT",
    "followingClientIds": ["abc", "bcd", "cde"]
}

我希望结果是这样的:

{
    "username": "user3",
    "country": "AT",
    "followingClientIds": ["abc", "bcd", "cde"],
    "fields": [ // dont really need this custom field, but would be cool
        "mutual_following_count": 3
    ]
},
{
    "username": "user2",
    "country": "AT",
    "followingClientIds": ["abc"],
    "fields": [
        "mutual_following_count": 1
    ]
},
{
    "username": "user4",
    "country": "AT",
    "followingClientIds": ["abc"],
    "fields": [
        "mutual_following_count": 1
    ]
}

【问题讨论】:

    标签: elasticsearch querydsl


    【解决方案1】:

    如果您正在寻找一个名为 mutual_following_count 的独立的 computed field,您可以使用下面的脚本来完成。但是你won't be able to sort on it

    唯一的另一种选择是脚本排序,它首先计算一个值,然后按它排序。生成的查询可能如下所示:

    {
      "size": 20,
      "query": {
        "bool": {
          "filter": {
            "term": {
              "country.keyword": "AT"
            }
          }
        }
      },
      "sort": [
        {
          "_script": {
            "type": "number",
            "order": "desc", 
            "script": {
              "lang": "painless", 
              "params": {
                "followingClientIds": ["abc", "bcd", "cde"]
              },
              "source": """
                // deduplicate
                def fromSource = doc.followingClientIds
                                    .stream()
                                    .distinct()
                                    .collect(Collectors.toList());
                def fromParams = params.followingClientIds
                                       .stream()
                                       .distinct()
                                       .collect(Collectors.toList());
                
                // size() is a float so cast
                return (int) fromParams.findAll(x -> fromSource.contains(x)).size();
              """
            }
          }
        }
      ]
    }
    

    缺点是你不能“命名”那种。既不是mutual_following_count,也不是其他任何东西。

    【讨论】:

    • 感谢您的回答。该查询看起来确实是我需要的,但它没有对结果进行排序。我尝试了fromSource.stream().filter(x -> fromParams.contains(x)).count(),但也没有用。
    • contains(x) 似乎没有比较正确的值
    • 哦,我明白了,List 中的元素都是小写,这就是contains 不起作用的原因。谢谢!
    • 对,没问题!您可能想查看.keyword 以保持大写。
    猜你喜欢
    • 1970-01-01
    • 2021-12-17
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-01-12
    • 2016-07-23
    • 1970-01-01
    • 2017-02-21
    相关资源
    最近更新 更多