【问题标题】:Painless Elasticsearch Script for Filtering Nested Array Objects用于过滤嵌套数组对象的无痛 Elasticsearch 脚本
【发布时间】:2021-06-15 18:26:30
【问题描述】:

我的用例类似于以下内容。 我有嵌套的对象数组warehouses 并尝试根据数组的最后一个元素进行过滤。

我得到了一些结果,但不是正确的。不过也想知道它的工作原理。

比方说,

我想根据库存数组的最后一个元素搜索产品。这是产品文档的样子:

{
  "productId": 5,
  "productName": "Shoes",
  "warehouses": [
    {
      "location": "Location A",
      "quantity": 100
    },
    {
      "location": "Location B",
      "quantity": 10
    },
    {
      "location": "Location C",
      "quantity": 50
    }
  ]
}

它的映射是:

PUT /products
{
  "mappings": {
    "properties": {
      "productId": {
        "type": "integer"
      },
      "productName": {
        "type": "text",
        "fields": {
            "raw": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "warehouses": {
        "properties": {
          "location": {
            "type": "text"
          },
          "quantity": {
            "type": "integer"  
          }
        }
      }
    }
  }
}

假设,我索引了以下 7 个文档:

POST products/_bulk
{"index":{"_id":1}}
{"productId":1,"productName":"Bags","warehouses":[{"location":"Location A","quantity":20},{"location":"Location B","quantity":30},{"location":"Location C","quantity":50}]}
{"index":{"_id":2}}
{"productId":2,"productName":"Shirts","warehouses":[{"location":"Location A","quantity":100},{"location":"Location B","quantity":150},{"location":"Location C","quantity":150}]}
{"index":{"_id":3}}
{"productId":3,"productName":"Shoes","warehouses":[{"location":"Location A","quantity":100},{"location":"Location B","quantity":10},{"location":"Location C","quantity":50}]}
{"index":{"_id":4}}
{"productId":4,"productName":"Shirt","warehouses":[{"location":"Location A","quantity":100},{"location":"Location B","quantity":10},{"location":"Location C","quantity":60}, {"location":"Location F","quantity":70}]}
{"index":{"_id":5}}
{"productId":5,"productName":"Socks","warehouses":[{"location":"Location A","quantity":800},{"location":"Location B","quantity":1500},{"location":"Location Z","quantity":1000}]}
{"index":{"_id":6}}
{"productId":6,"productName":"TV","warehouses":[{"location":"Location A","quantity":20},{"location":"Location B","quantity":150},{"location":"Location C","quantity":123}]}
{"index":{"_id":7}}
{"productId":7,"productName":"Table","warehouses":[{"location":"Location A","quantity":20},{"location":"Location B","quantity":200},{"location":"Location C","quantity":140}, {"location":"Location D","quantity":123}]}

现在我想用“数量”搜索和过滤产品:123。 所以根据上面的索引文档,我想过滤并获取 id:6 和 id:7 的产品,因为它的最后一个元素是数量:123。

这是我的无痛(完整)脚本:

GET /products/_search
{
  "query": {
    "bool": {
      "must": {
        "match_all": {}
      },
      "filter": {
        "bool": {
          "must": {
            "script": {
              "script": {
                "lang": "painless",
                "source": """
                  def x = doc['warehouses.quantity'];

                  def flag = false;
                    if(x[x.length - 2 ] == params.limit) {
                      flag = true;
                    }
                  
                  return flag;
                """,
                "params": {
                  "limit": 123
                }
              }
            }
          }
        }
      }
    }
  }
}

所以在上面的脚本中,我得到了id:6 的结果,这是电视产品。 当我用x[x.length - 3 ] 替换x[x.length - 2 ] 时,我可以得到id:7 的结果。

我不确定如何获得包含两个文档 [id:6 (TV) 和 id:7 (Table)] 的结果。

我使用的是 Elasticsearch 版本:7.8.1。

【问题讨论】:

    标签: elasticsearch elasticsearch-painless elasticsearch-7


    【解决方案1】:

    这是因为您的warehouses 数组不是nested 类型,因此无法保证该数组中元素的顺序(实际上是按值的升序排序)。您可以通过运行以下查询轻松看到这一点,并且您会看到 123 不一定在最后一个位置:

    GET /products/_search
    {
      "docvalue_fields": ["warehouses.quantity"]
    }
    

    回复:

      {
        "_index" : "products",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 1.0,
        "_source" : {
           ...
        },
        "fields" : {
          "warehouses.quantity" : [
            20,
            123,
            150
          ]
        }
      },
      {
        "_index" : "products",
        "_type" : "_doc",
        "_id" : "7",
        "_score" : 1.0,
        "_source" : {
           ...
        },
        "fields" : {
          "warehouses.quantity" : [
            20,
            123,
            140,
            200
          ]
        }
      }
    

    您需要更改映射

    PUT /products
    {
      "mappings": {
        "properties": {
          "productId": {
            "type": "integer"
          },
          "productName": {
            "type": "text",
            "fields": {
                "raw": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "warehouses": {
            "type": "nested",           <--- add this
            "properties": {
              "location": {
                "type": "text"
              },
              "quantity": {
                "type": "integer"  
              }
            }
          }
        }
      }
    }
    

    然后您的查询可能如下所示并返回文档 6 和 7:

    GET /products/_search
    {
      "query": {
        "bool": {
          "filter": [
            {
              "nested": {
                "path": "warehouses",
                "query": {
                  "script": {
                    "script": {
                      "source": """
                          def x = doc['warehouses.quantity'];
                          return x[-1] == params.limit;
                    """,
                      "params": {
                        "limit": 123
                      }
                    }
                  }
                }
              }
            }
          ]
        }
      }
    }
    

    快速提示:x[-1] 允许您访问数组的最后一个元素,无论其长度如何。

    【讨论】:

    • 这个运气好吗?
    【解决方案2】:

    感谢@Val 的回答,

    我尝试用 functionScore 查询解决它:

    GET products/_search
    {
      "min_score": 0.1,
      "query": {
        "function_score": {
          "query": {
            "match_all": {}
          },
          "functions": [
            {
              "script_score": {
                "script": {
                  "source": """
                    def last = params['_source']['warehouses'].length - 1;
                    
                    def quantityOfLast = params._source['warehouses'].get(last);
                    
                    if (quantityOfLast.quantity == params.limit) {
                      return 1;
                    } else {
                      return 0;
                    }
                    
    """,
                  "params": {
                    "limit": 70
                  }
                }
              }
            }
          ]
        }
      }
    }
    

    【讨论】:

    • 我很好奇我的回答有什么问题? function_score 查询并非用于此目的。你应该走我建议的路
    【解决方案3】:

    如果您按照@Val 所说的那样使用嵌套类型更改映射,则可以完全避免使用简单的脚本并使用简单的嵌套查询:

    新映射:

    PUT /products
    {
      "mappings": {
        "properties": {
          "productId": {
            "type": "integer"
          },
          "productName": {
            "type": "text",
            "fields": {
                "raw": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "warehouses": {
            "type": "nested", 
            "properties": {
              "location": {
                "type": "text"
              },
              "quantity": {
                "type": "integer"  
              }
            }
          }
        }
      }
    }
    

    查询:

    GET products/_search
    {
      "query": {
        "bool": {
          "must": [
            {
              "nested": {
                "path": "warehouses",
                "query": {
                  "term": {
                    "warehouses.quantity": {
                      "value": "123"
                    }
                  }
                }
              }
            }
          ]
        }
      }
    }
    

    结果:

    {
      "took" : 14,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 2,
          "relation" : "eq"
        },
        "max_score" : 0.0,
        "hits" : [
          {
            "_index" : "products",
            "_type" : "_doc",
            "_id" : "6",
            "_score" : 0.0,
            "_source" : {
              "productId" : 6,
              "productName" : "TV",
              "warehouses" : [
                {
                  "location" : "Location A",
                  "quantity" : 20
                },
                {
                  "location" : "Location B",
                  "quantity" : 150
                },
                {
                  "location" : "Location C",
                  "quantity" : 123
                }
              ]
            }
          },
          {
            "_index" : "products",
            "_type" : "_doc",
            "_id" : "7",
            "_score" : 0.0,
            "_source" : {
              "productId" : 7,
              "productName" : "Table",
              "warehouses" : [
                {
                  "location" : "Location A",
                  "quantity" : 20
                },
                {
                  "location" : "Location B",
                  "quantity" : 200
                },
                {
                  "location" : "Location C",
                  "quantity" : 140
                },
                {
                  "location" : "Location D",
                  "quantity" : 123
                }
              ]
            }
          }
        ]
      }
    }
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-01-25
      • 1970-01-01
      相关资源
      最近更新 更多