Python - 通过属性创建对象的引用树答案

【问题标题】：Python - creating a reference tree of objects by their attributesPython - 通过属性创建对象的引用树
【发布时间】：2022-01-15 10:17:30
【问题描述】：

我有一个对象列表。这些对象有嵌套的属性，这些属性是按照hpaulj在这篇帖子中回复的一种方法生成的：Object-like attribute access for nested dictionary。

我希望能够在这些对象中找到属性和属性值，并操作它们所持有的数据。然而，在实际场景中，可能有超过一百万个对象，并且属性可能嵌套很深，这使得在需要进行大量操作时通过平面列表进行搜索是一项代价高昂的工作。

例如，假设对象列表如下： list_of_objects = [object1, object2, object3, object4]

object1 具有以下属性： self.country = "Kenya", self.disease = "breast cancer"
object2 具有以下属性： self.country = "Kenya", self.disease = "diabetes"
object3 具有以下属性：self.country = 'Ireland', self.risk_factor.smoking = "Current"
object4 具有以下属性：self.country = 'Kenya', self.risk_factor.smoking = "Previous"

这些对象是从以下State 类创建的：

class State:
    def __init__(self, state_dictionary):
        self._state_dictionary = state_dictionary
        for key, value in self._state_dictionary.items():
            if isinstance(value, dict):
                value = State(value)
            setattr(self, key, value)

在 object3 的情况下，state_dictionary 的示例如下：

state_dictionary = {
    "country":"Ireland",
    "risk_factor":{
        "smoking":"Current"
    }
}

重要的是，嵌套属性也是 State 对象。

我想影响所有拥有一个属性、一组嵌套属性或拥有一个具有指定值的属性的对象。

我的想法是创建一个“控制器”，它将原始列表作为单独的列表存储在控制器类的对象实例中。每个原始属性和值都会指向包含这些属性或值的对象列表，基本设计如下：

class Controller:

    def __init__(self, list_of_objects):
        self.list_of_objects = list_of_objects  # Our list of objects from above
        self.create_hierarchy_of_objects()

    def create_hierarchy_of_objects(self):
        for o in self.list_of_objects:
            #  Does something here

create_hierarchy_of_objects 方法运行后，我可以进行以下操作：

Controller.country.Kenya 将包含 self.country 为“肯尼亚”的所有对象的列表，即 object1、object2、object4
Controller.disease 将包含具有 self.disease 属性的所有对象的列表，即 object1 和 object2
Controller.risk_factor.smoking.Current 将包含具有该组属性的对象列表，即 object3

问题是 create_hierarchy_of_objects 将如何工作？ 我几点澄清

嵌套任意长，并且值可能相同，例如
- self.risk_factor.attribute1.attribute2 = "foo"
- self.risk_factor.attribtue3.attribute4 = "foo" 也一样。
可能有更简单的方法可以做到这一点，我欢迎任何建议。

【问题讨论】：

标签： python object filter hierarchical

【解决方案1】：

如果您必须处理超过一百万个对象，生成额外的层次结构可能不是最佳解决方案。这将需要许多额外的对象并浪费大量时间来创建层次结构。每当list_of_objects 更改时，也需要更新层次结构。

因此，我建议使用 iterators 和类似 XPath 的原则来使用更通用和动态的方法。我们称之为OPath。 OPath 类是一个轻量级对象，它简单地将属性连接到一种属性路径。它还保留对 entry 对象 的原始列表的引用。最后，它仅基于属性，因此适用于任何类型的对象。

实际的查找发生在我们开始迭代 OPath 对象时（例如，将对象放入 list()，使用 for-loop，...）。 OPath 返回一个迭代器，它根据原始提供的列表中的实际内容，根据属性路径递归查找匹配的对象。它yields 一个接一个地匹配对象，以避免使用完全填充的匹配对象创建不必要的列表。

class OPath:
    def __init__(self, objects, path = []):
        self.__objects = objects
        self.__path = path

    def __getattr__(self, __name):
        return OPath(self.__objects, self.__path + [__name])

    def __iter__(self):
        yield from (__object for __object in self.__objects if self.__matches(__object, self.__path))

    @staticmethod
    def __matches(__object, path):
        if path:
            if hasattr(__object, path[0]):
                return OPath.__matches(getattr(__object, path[0]), path[1:])
            if __object == path[0] and len(path) <= 1:
                return True
            return False
        return True

if __name__ == '__main__':
    class State:
        def __init__(self, state_dictionary):
            self._state_dictionary = state_dictionary
            for key, value in self._state_dictionary.items():
                if isinstance(value, dict):
                    value = State(value)
                setattr(self, key, value)

    o1 = State({ "country":"Kenya", "disease": "breast cancer" })
    o2 = State({ "country":"Kenya", "disease": "diabetes" })
    o3 = State({ "country":"Ireland", "risk_factor": { "smoking":"Current" } })
    o4 = State({ "country":"Kenya", "risk_factor": { "smoking":"Previous" } })

    # test cases with absolute paths
    print("Select absolute")
    path = OPath([o1, o2, o3, o4])
    print(list(path) == [o1, o2, o3, o4])
    print(list(path.country) == [o1, o2, o3, o4])
    print(list(path.country.Kenya) == [o1, o2, o4])
    print(list(path.disease) == [o1, o2])
    print(list(path.disease.diabetes) == [o2])
    print(list(path.risk_factor.smoking) == [o3, o4])
    print(list(path.risk_factor.smoking.Current) == [o3])
    print(list(path.doesnotexist.smoking.Current) == [])
    print(list(path.risk_factor.doesnotexist.Current) == [])
    print(list(path.risk_factor.smoking.invalidvalue) == [])
    print(list(path.risk_factor.doesnotexist.Current.invalidpath) == [])

    # test cases with relative paths
    country = OPath([o1, o2, o3, o4], ["country"])
    print("Select relative from country:")
    print(list(country) == [o1, o2, o3, o4])
    print(list(country.Kenya) == [o1, o2, o4])

    print("Select all with country=Kenya")
    kenya = OPath([o1, o2, o3, o4], ['country', 'Kenya'])
    print(list(kenya) == [o1, o2, o4])

所有测试用例的输出预计为True。

【讨论】：