【问题标题】:ElasticSearch: Filter on deeply nested dataElasticSearch:过滤深度嵌套的数据
【发布时间】:2014-01-23 16:33:34
【问题描述】:

我们的数据存储在 MongoDB 2.4.8 中,并使用 ElasticSearch MongoDB River 1.7.3 索引到 ElasticSearch 0.90.7。

我们的数据索引正确,我可以成功搜索到我们想要搜索的字段。但我还需要过滤权限——当然我们只想返回调用用户实际可以读取的结果。

在我们服务器上的代码中,我将调用用户的权限作为一个数组,例如:

[ "Role:REGISTERED_USER", "Account:52c74b25da06f102c90d52f4", "Role:USER", "Group:52cb057cda06ca463e78f0d7" ]

我们正在搜索的单位数据示例如下:

{
    "_id" : ObjectId("52dffbd6da06422559386f7d"),
    "content" : "various stuff",
    "ownerId" : ObjectId("52d96bfada0695fcbdb41daf"),
    "acls" : [
        {
            "accessMap" : {},
            "sourceClass" : "com.bulb.learn.domain.units.PublishedPageUnit",
            "sourceId" : ObjectId("52dffbd6da06422559386f7d")
        },
        {
            "accessMap" : {
                "Role:USER" : {
                    "allow" : [
                        "READ"
                    ]
                },
                "Account:52d96bfada0695fcbdb41daf" : {
                    "allow" : [
                        "CREATE",
                        "READ",
                        "UPDATE",
                        "DELETE",
                        "GRANT"
                    ]
                }
            },
            "sourceClass" : "com.bulb.learn.domain.units.CompositeUnit",
            "sourceId" : ObjectId("52dffb54da06422559386f57")
        }
    ]
}

在上面的示例数据中,我已将所有可搜索的内容替换为 "content" : "various stuff"

授权数据在“acls”数组中。我需要编写的过滤器将执行以下操作(英文):

pass all units where the "acls" array
contains an "accessMap" object
that contains a property whose name is one of the user's authorization strings
and whose "allow" property contains "READ"
and whose "deny" property does not contain "READ"

在上面的例子中,用户有“Role:USER”授权,而这个单元有一个accessMap,它有“Role:USER”,其中包含“allow”,其中包含“READ”,以及“Role:USER”不包含“拒绝”。所以这个单元会通过过滤器。

我没有看到如何使用 ElasticSearch 编写过滤器。

我的印象是有两种方法可以处理这样的嵌套数组:“nested”或“has_child”(或“has_parent”)。

我们不愿意使用“嵌套”过滤器,因为它显然需要在任何数据更改时重新索引整个块。可搜索的内容和授权数据可以随时更改,以响应用户操作。

在我看来,为了使用“has_child”或“has_parent”,授权数据必须与单元数据分开(在不同的集合中?),当一个节点被索引时,它会必须指定其父或子。我不知道 ElasticSearch MongoDB River 是否能够做到这一点。

那么这甚至可能吗?还是我们应该重新排列授权数据?

【问题讨论】:

  • 我会为不同级别的访问使用单独的索引,并将访问控制添加到 ES 之上的代理。

标签: elasticsearch


【解决方案1】:

你需要稍微重构一下你的数据。

在 Elasticsearch 中,键中有值是有问题的。它将最终成为一个单独的字段,并且您将拥有一个不断增长的映射,因此还有集群状态。

您可能希望 accessMap 是一个对象列表,将当前的键作为值。然后,它必须嵌套。否则,您将无法知道匹配的 allow 属于哪个 accessMap。

ACL 是应该嵌套(导致两级嵌套)还是父子级取决于您更新各种对象的频率。通过将它们作为对象上的嵌套文档,您需要支付每次更新时加入的成本。如果是亲子,每次搜索都需要支付加盟费用。

这很快就会变得复杂,所以我准备了一个简化的可运行示例,您可以使用:https://www.found.no/play/gist/8582654

请注意 nested- 和 bool- 过滤器是如何嵌套的。将两个嵌套在其中并带有一个布尔值是行不通的。

#!/bin/bash

export ELASTICSEARCH_ENDPOINT="http://localhost:9200"

# Create indexes

curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{
    "settings": {
        "analysis": {}
    },
    "mappings": {
        "type": {
            "properties": {
                "acls": {
                    "type": "nested",
                    "properties": {
                        "accessMap": {
                            "type": "nested",
                            "properties": {
                                "allow": {
                                    "type": "string",
                                    "index": "not_analyzed"
                                },
                                "deny": {
                                    "type": "string",
                                    "index": "not_analyzed"
                                },
                                "key": {
                                    "type": "string",
                                    "index": "not_analyzed"
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}'


# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"type","_id":1}}
{"acls":[{"accessMap":[{"key":"Role:USER","allow":["READ"]},{"key":"Account:52d96bfada0695fcbdb41daf","allow":["READ","UPDATE"]}]}]}
{"index":{"_index":"play","_type":"type","_id":2}}
{"acls":[{"accessMap":[{"key":"Role:USER","allow":["READ"]},{"key":"Account:52d96bfada0695fcbdb41daf","deny":["READ","UPDATE"]}]}]}
{"index":{"_index":"play","_type":"type","_id":3}}
{"acls":[{"accessMap":[{"key":"Role:USER","allow":["READ"]},{"key":"Account:52d96bfada0695fcbdb41daf","allow":["READ","UPDATE"]}]}]}
'

# Do searches

curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
    "query": {
        "filtered": {
            "filter": {
                "nested": {
                    "path": "acls",
                    "filter": {
                        "bool": {
                            "must": {
                                "nested": {
                                    "path": "acls.accessMap",
                                    "filter": {
                                        "bool": {
                                            "must": [
                                                {
                                                    "term": {
                                                        "allow": "READ"
                                                    }
                                                },
                                                {
                                                    "terms": {
                                                        "key": [
                                                            "Role:USER",
                                                            "Account:52d96bfada0695fcbdb41daf"
                                                        ]
                                                    }
                                                }
                                            ]
                                        }
                                    }
                                }
                            },
                            "must_not": {
                                "nested": {
                                    "path": "acls.accessMap",
                                    "filter": {
                                        "bool": {
                                            "must": [
                                                {
                                                    "term": {
                                                        "deny": "READ"
                                                    }
                                                },
                                                {
                                                    "terms": {
                                                        "key": [
                                                            "Role:USER",
                                                            "Account:52d96bfada0695fcbdb41daf"
                                                        ]
                                                    }
                                                }
                                            ]
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
'

为了完整起见,这里有一个类似的父子方法示例:https://www.found.no/play/gist/8586840

#!/bin/bash

export ELASTICSEARCH_ENDPOINT="http://localhost:9200"

# Create indexes

curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{
    "settings": {
        "analysis": {}
    },
    "mappings": {
        "acl": {
            "_parent": {
                "type": "document"
            },
            "properties": {
                "acls": {
                    "properties": {
                        "accessMap": {
                            "type": "nested",
                            "properties": {
                                "key": {
                                    "type": "string",
                                    "index": "not_analyzed"
                                },
                                "allow": {
                                    "type": "string",
                                    "index": "not_analyzed"
                                },
                                "deny": {
                                    "type": "string",
                                    "index": "not_analyzed"
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}'


# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"document","_id":1}}
{"title":"Doc 1"}
{"index":{"_index":"play","_type":"acl","_parent":1}}
{"acls":[{"accessMap":[{"key":"Role:USER","allow":["READ"]},{"key":"Account:52d96bfada0695fcbdb41daf","allow":["READ","UPDATE"]}]}]}
{"index":{"_index":"play","_type":"document","_id":2}}
{"title":"Doc 2"}
{"index":{"_index":"play","_type":"acl","_parent":2}}
{"acls":[{"accessMap":[{"key":"Role:USER","allow":["READ"]},{"key":"Account:52d96bfada0695fcbdb41daf","deny":["READ","UPDATE"]}]}]}
'

# Do searches

curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
    "query": {
        "filtered": {
            "filter": {
                "has_child": {
                    "type": "acl",
                    "filter": {
                        "bool": {
                            "must": [
                                {
                                    "nested": {
                                        "path": "acls.accessMap",
                                        "filter": {
                                            "bool": {
                                                "must": [
                                                    {
                                                        "terms": {
                                                            "key": [
                                                                "Role:USER",
                                                                "Account:52d96bfada0695fcbdb41daf"
                                                            ]
                                                        }
                                                    },
                                                    {
                                                        "term": {
                                                            "allow": "READ"
                                                        }
                                                    }
                                                ]
                                            }
                                        }
                                    }
                                }
                            ],
                            "must_not": [
                                {
                                    "nested": {
                                        "path": "acls.accessMap",
                                        "filter": {
                                            "bool": {
                                                "must": [
                                                    {
                                                        "terms": {
                                                            "key": [
                                                                "Role:USER",
                                                                "Account:52d96bfada0695fcbdb41daf"
                                                            ]
                                                        }
                                                    },
                                                    {
                                                        "term": {
                                                            "deny": "READ"
                                                        }
                                                    }
                                                ]
                                            }
                                        }
                                    }
                                }
                            ]
                        }
                    }
                }
            }
        }
    }
}
'

【讨论】:

  • 我更新了答案,还包括了一个类似的带有父子映射的示例。
  • 一个重要的简化是可能的,因为我实际上并不关心允许或拒绝来自哪个 accessMap - 如果有任何匹配的拒绝,过滤器必须失败,否则如果有任何匹配的允许,则过滤器必须成功。这是否允许我们完全移除嵌套?
  • 那行不通,就好像用户不是其成员的组存在拒绝一样,您仍然会得到匹配项。
【解决方案2】:

谢谢,@Alex Brasetvik,您建议使用主题 ID 数据而不是键,并且您对嵌套的解释是“每次更新加入”,但父子是“每次查询加入”,是最重要的有帮助。

我发现我必须“取消嵌套”数据才能使用父子方法,我们更愿意保持授权数据嵌套。

我不明白您所说的“将两个嵌套在其中的布尔值是行不通的。”

我是这样重构数据的:

{
    "_id" : ObjectId("52dffbd6da06422559386f7d"),
    "content" : "various stuff",
    "ownerId" : ObjectId("52d96bfada0695fcbdb41daf"),
    "accessMaps" : [
        {
            "sourceClass" : "com.bulb.learn.domain.units.PublishedPageUnit",
            "sourceId" : ObjectId("52dffbd6da06422559386f7d")
        },
        {
            "allow" : {
                "CREATE" : [
                    "Account:52d96bfada0695fcbdb41daf"
                ],
                "READ" : [
                    "Account:52d96bfada0695fcbdb41daf",
                    "Role:USER"
                ],
                "UPDATE" : [
                    "Account:52d96bfada0695fcbdb41daf"
                ],
                "DELETE" : [
                    "Account:52d96bfada0695fcbdb41daf"
                ],
                "GRANT" : [
                    "Account:52d96bfada0695fcbdb41daf"
                ]
            },
            "deny" : {},
            "sourceClass" : "com.bulb.learn.domain.units.CompositeUnit",
            "sourceId" : ObjectId("52dffb54da06422559386f57")
        }
    ]
}

新的映射如下所示:

{
  "unit": {
    "properties": {
      "accessMaps": {
        "type": "nested",
        "properties": {
          "allow": {
            "type": "nested",
            "properties": {
              "CREATE": {
                "type": "string",
                "index": "not_analyzed",
              },
              "DELETE": {
                "type": "string",
                "index": "not_analyzed",
              },
              "GRANT": {
                "type": "string",
                "index": "not_analyzed",
              },
              "READ": {
                "type": "string",
                "index": "not_analyzed",
              },
              "UPDATE": {
                "type": "string",
                "index": "not_analyzed",
              }
            } 
          },    
          "deny": {
            "type": "nested",
            "properties": {
              "CREATE": {
                "type": "string",
                "index": "not_analyzed",
              },
              "DELETE": {
                "type": "string",
                "index": "not_analyzed",
              },
              "GRANT": {
                "type": "string",
                "index": "not_analyzed",
              },
              "READ": {
                "type": "string",
                "index": "not_analyzed",
              },
              "UPDATE": {
                "type": "string",
                "index": "not_analyzed",
              } 
            }   
          },    
          "sourceClass": {
            "type": "string"
          },
          "sourceId": {
            "type": "string"
          }
        }
      }
    }
  }
}

过滤后的查询如下所示:

{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "bool": {
          "must": {
            "nested": {
              "path": "accessMaps.allow",
              "filter": {
                "terms": {
                  "accessMaps.allow.READ": [
                    "Role:REGISTERED_USER",
                    "Account:52e6a361da06e4eb64172519",
                    "Role:USER",
                    "Group:52cb057cda06ca463e78f0d7"
                  ]
                }
              }
            }
          },
          "must_not": {
            "nested": {
              "path": "accessMaps.deny",
              "filter": {
                "terms": {
                  "accessMaps.deny.READ": [
                    "Role:REGISTERED_USER",
                    "Account:52e6a361da06e4eb64172519",
                    "Role:USER",
                    "Group:52cb057cda06ca463e78f0d7"
                  ]
                }
              }
            }
          }
        }
      }
    }
  }
}

我遇到的最大问题是弄清楚如何在嵌套过滤器中使用“路径”属性,并且术语过滤器中的字段名称必须是完全限定的。我希望 ElasticSearch 能在他们的文档中投入更多精力。

【讨论】:

  • 你认真对待别人的回答,把它变成你自己的,然后把功劳归于自己?什么鬼?
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2019-01-29
  • 1970-01-01
  • 2018-01-29
  • 2019-08-10
  • 1970-01-01
  • 2017-07-27
  • 1970-01-01
相关资源
最近更新 更多