通过嵌套字典键获取唯一列表项答案

【问题标题】：Get unique list items by nested dictionary key通过嵌套字典键获取唯一列表项
【发布时间】：2016-01-07 16:20:01
【问题描述】：

我将列表作为字典值嵌套在另一个名为 data 的字典中。我一直在尝试找到一种快速的方法来从特定的嵌套键中获取所有唯一列表项，例如key1 或key2。

我想出了以下功能，这似乎不是很有效。有什么想法可以加快速度并变得更加 Pythonic 吗？

Python 函数

def get_uniq_by_value(data, val_name):
    results = []
    for key, value in data.iteritems():
        for item in value[val_name]:
            if item not in results:
                results.append(item)
    return results

示例数据

data = {
"top1": {
    "key1": [
        "there is no spoon", "but dictionaries are hard",
    ],
    "key2": [
        "mad max fury road was so good",
    ]
},
"top2": {
    "key1": [
        "my item", "foo bar"
    ],
    "key2": [
        "blah", "more junk"
    ]
},

【问题讨论】：

不，结果中的顺序根本不重要
你考虑过set吗？如果您从不使用key，为什么还要迭代iteritems？请注意，如果其中一个嵌套字典没有 val_name 键，您将使用 KeyError 崩溃。
好点！此外，每个嵌套字典中的数据将始终包含 key1 和 key2。如果没有项目，则列表留空。我之前没有真正使用过set...我可以附加到它并且只保留唯一值吗？
@deadbits 是的，set.add 会悄悄地忽略重复项

标签： python dictionary nested

【解决方案1】：

如果顺序无关紧要，您可以使用set / set comprehension 来获得所需的结果 -

def get_uniq_by_value(data, val_name):
    return {val for value in data.values() for val in value.get(val_name,[])}

如果您想要一个列表作为结果，您可以在集合推导上使用list() 在返回之前将结果集转换为列表。

演示 -

>>> def get_uniq_by_value(data, val_name):
...     return {val for value in data.values() for val in value.get(val_name,[])}
...
>>> data = {
... "top1": {
...     "key1": [
...         "there is no spoon", "but dictionaries are hard",
...     ],
...     "key2": [
...         "mad max fury road was so good",
...     ]
... },
... "top2": {
...     "key1": [
...         "my item", "foo bar"
...     ],
...     "key2": [
...         "blah", "more junk"
...     ]
... }}
>>> get_uniq_by_value(data,"key1")
{'but dictionaries are hard', 'my item', 'foo bar', 'there is no spoon'}

如下面的 cmets 所示，如果顺序很重要并且 data 已经是 OrderedDict 的 collections.OrderedDict，则可以使用新的 OrderedDict ，并将列表中的元素添加为键，@ 987654329@ 将避免任何重复并保留添加键的顺序。

您也可以使用OrderedDict.fomkeys 在一行中完成此操作，如 cmets 中所示。示例 -

from collections import OrderedDict
def get_uniq_by_value(data, val_name):
    return list(OrderedDict.fromkeys(val for value in data.values() for val in value.get(val_name,[])))

请注意，这仅适用于data 是嵌套的 OrderedDict，否则data 的元素将不会以任何特定的顺序开始。

演示 -

>>> from collections import OrderedDict
>>> data = OrderedDict([
... ("top1", OrderedDict([
...     ("key1", [
...         "there is no spoon", "but dictionaries are hard",
...     ]),
...     ("key2", [
...         "mad max fury road was so good",
...     ])
... ])),
... ("top2", OrderedDict([
...     ("key1", [
...         "my item", "foo bar"
...     ]),
...     ("key2", [
...         "blah", "more junk"
...     ])
... ]))])
>>>
>>> def get_uniq_by_value(data, val_name):
...     return list(OrderedDict.fromkeys(val for value in data.values() for val in value.get(val_name,[])))
...
>>> get_uniq_by_value(data,"key1")
['there is no spoon', 'but dictionaries are hard', 'my item', 'foo bar']

【讨论】：

如果顺序确实很重要，使用OrderedDict.fromkeys 转换为collections.OrderedDict，然后将其包装在list 中将保留顺序，同时将重复项作为单行删除。
好吧，data 还需要是一个有序字典才能工作。