【问题标题】:Select documents by array of objects when at least one object doesn't contain necessary field Elasticsearch当至少一个对象不包含必要字段时,按对象数组选择文档 Elasticsearch
【发布时间】:2021-04-13 14:45:33
【问题描述】:

我在 elasticsearch 中有文档,如果任何 附件 不包含 uuid 或 uuid 为空,我无法理解如何应用应该返回文档的搜索脚本。弹性 5.2 版本。 文件映射

"mappings": {
    "documentType": {
        "properties": {
            "attachment": {
                "properties": {
                    "uuid": {
                        "type": "text"
                    },
                    "path": {
                        "type": "text"
                    },
                    "size": {
                        "type": "long"
                    }
                }
            }}}

在弹性搜索中看起来像

{
        "_index": "documents",
        "_type": "documentType",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "attachment": [
               {
                "uuid": "21321321",
                "path": "../uploads/somepath",
                "size":1231
               },
               {
                "path": "../uploads/somepath",
                "size":1231
               },      
         ]},
{
        "_index": "documents",
        "_type": "documentType",
        "_id": "2",
        "_score": 1.0,
        "_source": {
          "attachment": [
               {
                "uuid": "223645641321321",
                "path": "../uploads/somepath",
                "size":1231
               },
               {
                "uuid": "22341424321321",
                "path": "../uploads/somepath",
                "size":1231
               },        
         ]},
{
        "_index": "documents",
        "_type": "documentType",
        "_id": "3",
        "_score": 1.0,
        "_source": {
          "attachment": [
               {
                "uuid": "22789789341321321",
                "path": "../uploads/somepath",
                "size":1231
               }, 
               {
                "path": "../uploads/somepath",
                "size":1231
               },      
         ]}

结果我想获取带有 _id 1 和 3 的附件。但结果我得到了脚本错误 我尝试应用下一个脚本:

{
    "query": {
        "bool": {
            "must": [
                {
                    "exists": {
                        "field": "attachment"
                    }
                },
                {
                    "script": {
                        "script": {
                            "inline": "for (item in doc['attachment'].value) { if (item['uuid'] == null) { return true}}",
                            "lang": "painless"
                        }
                    }
                }
            ]
        }
    }
}

错误是下一个:

 "root_cause": [
            {
                "type": "script_exception",
                "reason": "runtime error",
                "script_stack": [
                    "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:77)",
                    "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:36)",
                    "for (item in doc['attachment'].value) { ",
                    "                 ^---- HERE"
                ],
                "script": "for (item in doc['attachment'].value) { if (item['uuid'] == null) { return true}}",
                "lang": "painless"
            }
        ],

如果一个附件对象不包含uuid,是否可以选择文档?

【问题讨论】:

    标签: elasticsearch elasticsearch-query elasticsearch-painless elasticsearch-scripting


    【解决方案1】:

    只是补充this answer。如果 uuid 字段的映射是自动创建的,则弹性搜索以这种方式添加它:

    "uuid": {
        "type": "text",
        "fields": {
            "keyword": {
                "type": "keyword",
                "ignore_above": 256
            }
        }
    }
    

    那么脚本可能如下所示:

    POST documents/_search
    {
        "query": {
            "bool": {
                "must": [
                    {
                        "exists": {
                            "field": "attachment"
                        }
                    },
                    {
                        "script": {
                            "script": {
                                "inline": "doc['attachment.size'].length > doc['attachment.uuid.keyword'].length",
                                "lang": "painless"
                            }
                        }
                    }
                ]
            }
        }
    }
    

    【讨论】:

    • 你是对的,即使没有重新索引,你的建议也可以正常工作
    【解决方案2】:

    迭代对象数组并不像人们想象的那么简单。我已经写了很多关于它的文章herehere

    由于您的attachments 未定义为nested,ES 将在内部将它们表示为扁平的值列表(也称为“文档值”)。例如,doc#2 中的attachment.uuid 将变为["223645641321321", "22341424321321"],而attachments.size 将变为[1231, 1231]

    这意味着您可以简单地比较这些扁平化表示的.length!我假设attachment.size始终存在,因此可以作为比较基准。

    还有一件事。为了利用这些优化的文本字段的文档值,它将require one small mapping change

    PUT documents/documentType/_mappings
    {
      "properties": {
        "attachment": {
          "properties": {
            "uuid": {
              "type": "text",
              "fielddata": true     <---
            },
            "path": {
              "type": "text"
            },
            "size": {
              "type": "long"
            }
          }
        }
      }
    }
    

    完成后,您已重新索引您的文档 — 可以使用这个小 Update by query trick

    POST documents/_update_by_query
    

    然后您可以使用以下脚本查询:

    POST documents/_search
    {
      "query": {
        "bool": {
          "must": [
            {
              "exists": {
                "field": "attachment"
              }
            },
            {
              "script": {
                "script": {
                  "inline": "def size_field_length = doc['attachment.size'].length; def uuid_field_length =  doc['attachment.uuid'].length; return uuid_field_length < size_field_length",
                  "lang": "painless"
                }
              }
            }
          ]
        }
      }
    }
    

    【讨论】:

    • 感谢您的回复。我会检查它并寻找未来我将为嵌套更改类型的附件重新编制索引。
    • 不客气。使附件嵌套是一个好方法。请记住,使用nested 字段编写脚本是它自己的一章。不过,我的答案顶部的链接应该为您指明正确的方向!
    • 它有效。我已经重新索引并运行了你的脚本,它可以工作。非常感谢。
    猜你喜欢
    • 1970-01-01
    • 2018-03-21
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-05-07
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多