Elasticsearch 按数组的第一个元素排序答案

【问题标题】：Elasticsearch sort by first element of arrayElasticsearch 按数组的第一个元素排序
【发布时间】：2017-11-01 01:22:34
【问题描述】：

我正在使用 Elasticsearch 5.5 并且有一个具有此类映射的索引：

{
  "my_index": {
    "mappings": {
      "my_type": {
        "properties": {
          "title": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "my_array": {
            "properties": {
              "array": {
                "type": "float"
              },
              "length": {
                "type": "long"
              }
            }
          }
        }
      }
    }
  }
}

我想按标题搜索并按数组中的第一个值排序。将第一个值设置为_score 字段也很棒。所以，我准备了这样的请求：

GET my_index/my_type/_search
{
    "query": {
      "term": {
        "title.keyword": "Shorts"
      }
    }, 
    "sort" : {
        "_script" : {
            "type" : "number",
            "script" : {
                "lang": "painless",
                "inline": "doc['my_array.array'][0]"
            },
            "order" : "asc"
        }
    }
}

不幸的是，它不起作用。我觉得有什么遗漏或错误。

【问题讨论】：

它给出了什么错误？
@HatimStovewala 没有错误，但是顺序错误
能否提供一些示例文件和回复？您期望在响应中包含什么以及实际上是什么？谢谢。
@NikolayVasiliev 有这样的请求：GET my_index/my_type/_search { "sort" : { "_script" : { "type" : "number", "script" : { "lang": "painless ", "inline": "doc['embeddings.array'][0]" }, "order" : "asc" } } } 回复是jsonblob.com/05e5893a-be1c-11e7-9ae8-cdd8d94a615d
这是访问数组doc['my_array.array'][0]的第一个元素的有效方法吗？

标签： elasticsearch

【解决方案1】：

使用 Painless 脚本的正确方法是这样的：

{
  "query": {
    "term": {
      "title.keyword": "Shorts"
    }
  },
  "sort": {
    "_script": {
      "type": "number",
      "script": {
        "lang": "painless",
        "inline": "params._source.my_array.array[0]"
      },
      "order": "asc"
    }
  }
}

【讨论】：

【解决方案2】：

正如Andrei 在他的回答中指出的那样，您应该在无痛脚本中直接引用_source。

发生这种情况是因为在 Lucene 索引（这是 ElasticSearch 的构建基础）中没有数组中值的原始顺序的概念。另外，arrays do not work as you would expect：

对象数组不能像您期望的那样工作：您无法查询每个对象独立于数组中的其他对象。

基本上，您是按列表中的随机数排序的。

Andrei 的建议是使用_source，即读取原始 JSON 文档，对其进行解析并从中提取所需的值（它会起作用）。但是_source 很慢（因为不是访问您从磁盘读取的快速索引，而是每次，每个文档）。

你还有两个选择：

将此第一个元素作为单独的字段移动；
使用nested data type 并明确定义顺序。

希望有帮助！

【讨论】：