Elasticsearch - 按键字符串长度对术语聚合的结果进行排序答案

【问题标题】：Elasticsearch - Sort results of Terms aggregation by key string lengthElasticsearch - 按键字符串长度对术语聚合的结果进行排序
【发布时间】：2021-09-22 05:35:23
【问题描述】：

我正在使用术语聚合查询 ES，以查找字符串字段 foo 的第一个 N 唯一值，其中该字段包含子字符串 bar，并且文档匹配其他一些约束。

目前我可以按字母顺序按键字符串对结果进行排序：

{
  "query": {other constraints},
  "aggs": {
    "my_values": {
      "terms": {
        "field": "foo.raw",
        "include": ".*bar.*",
        "order": {"_key": "asc"},
        "size": N
      }
    }
  }
}

这给出了类似的结果

{
  ...
  "aggregations": {
    "my_values": {
      "doc_count_error_upper_bound": 0,   
      "sum_other_doc_count": 145,           
      "buckets": [                        
        {
          "key": "aa_bar_aa",
          "doc_count": 1
        },
        {
          "key": "iii_bar_iii",
          "doc_count": 1
        },
        {
          "key": "z_bar_z",
          "doc_count": 1
       }
      ]
    }
  }
}

如何更改order 选项，以便桶按foo 键字段中的字符串长度排序，这样结果就像

{
  ...
  "aggregations": {
    "my_values": {
      "doc_count_error_upper_bound": 0,   
      "sum_other_doc_count": 145,           
      "buckets": [                        
        {
          "key": "z_bar_z",
          "doc_count": 1
        },
        {
          "key": "aa_bar_aa",
          "doc_count": 1
        },
        {
          "key": "iii_bar_iii",
          "doc_count": 1
        }
      ]
    }
  }
}

这是需要的，因为较短的字符串更接近搜索子字符串，因此被认为是“更好”的匹配，因此应该比较长的字符串更早地出现在结果中。任何根据桶与原始子字符串的相似程度对桶进行排序的替代方法也会有所帮助。

我需要在 ES 中进行排序，这样我只需要从 ES 加载顶部的 N 结果。

【问题讨论】：

标签： sorting elasticsearch elasticsearch-aggregation elasticsearch-6

【解决方案1】：

我想出了一个办法来做到这一点。我使用每个动态存储桶的子聚合来计算密钥字符串的长度作为另一个字段。然后我可以先按这个新的长度字段排序，然后按实际键排序，因此相同长度的键按字母顺序排序。

{
  "query": {other constraints},
  "aggs": {
    "my_values": {
      "terms": {
        "field": "foo.raw",
        "include": ".*bar.*",
        "order": [
          {"key_length": "asc"},
          {"_key": "asc"}
        ],
        "size": N
      },
      "aggs": {
        "key_length": {
          "max": {"script": "doc['foo.raw'].value.length()" }
        }
      }
    }
  }
}

这给了我类似的结果

{
  ...
  "aggregations": {
    "my_values": {
      "doc_count_error_upper_bound": 0,   
      "sum_other_doc_count": 145,           
      "buckets": [                        
        {
          "key": "z_bar_z",
          "doc_count": 1
        },
        {
          "key": "aa_bar_aa",
          "doc_count": 1
        },
        {
          "key": "dd_bar_dd",
          "doc_count": 1
        },
        {
          "key": "bbb_bar_bbb",
          "doc_count": 1
        }
      ]
    }
  }
}

这正是我想要的。

【讨论】：