这个 Python 2.7 setdefault/defaultdict 代码如何给出嵌套结果？答案

【问题标题】：How is this Python 2.7 setdefault/ defaultdict code giving a nested result?这个 Python 2.7 setdefault/defaultdict 代码如何给出嵌套结果？
【发布时间】：2017-04-18 18:14:23
【问题描述】：

这似乎是 setdefault 和 defaultdict 的一个非常简单的用法，我无法理解，如果有人能解释“为什么”下面的代码有效，那就太好了。

d = {}

for name in ['foo', 'bar', 'bars']:
    t = d
    for char in name:
        t = t.setdefault(char,{}) # Should be a empty {}

print d
# Prints {'b': {'a': {'r': {'s': {}}}}, 'f': {'o': {'o': {}}}}

我无法理解这段代码是如何工作的。当 t = t.setdefault(char,{}) 行执行时，它应该为 t 分配一个空字典，但是它如何影响 d 以使 d 最终成为嵌套字典？

另外，如果我使用defaultdict，那么与上述内容的等价物是什么。我想出了这个错误：

d1 = defaultdict(dict)

for name in ['foo', 'bar', 'bars']:
    t1 = d1
    for char in name:
        t1 = t1[char]

print d1

如果有人能指出应该如何理解 defaultdicts，那就太好了

【问题讨论】：

与列表一样，t 和 d 都指向同一个 dict 对象。当t 发生变化时，d 也会发生变化。
好的，我现在有点明白了。因此，当语句 t = t.setdefault(char,{}) 执行时，t.setdefault(char,{}) 部分会改变字典 d，并且由于右侧评估为 {} , t 设置为 {}。这种理解正确吗？
@VikashRajaSamuelSelvin：您的理解是正确的。如果程序根本不使用t 变量，而只是使用_ = d.setdefault(char, {})，程序会以同样的方式工作，并且会更清晰。下划线变量将提供一个线索，表明该分配刚刚被丢弃（即是）。
@Gerrat 如果我使用 _ = d.setdefault(char, {}) 它返回 {'a': {}, 'b': {}, 'f': {} , 'o': {}, 's': {}, 'r': {}} 这是我所期望的，但语句 t = t.setdefault(char,{}) 它有效地创建了一个形式为 {'b': {'a': {'r': {'s': {}}}}, 'f': {'o': {' o'：{}}}}。我似乎无法理解这一点，如果你能解释为什么会发生这种行为，那就太好了。谢谢。
@VikashRajaSamuelSelvin：啊，我明白了。我之前的理解是不正确的。这实际上很有趣。我看看能不能给出一个简单的解释。

标签： python python-2.7 dictionary nested defaultdict

【解决方案1】：

对于第一部分，t = d 行不会复制d。它只创建一个对d 的新引用并将其存储在t 中。 t 和 d 现在指的是同一个对象；换句话说，您只有一个对象，但该对象有两个名称。由于该对象是一个可变对象（在本例中为 dict），因此更改 t 也会更改 d，因为只有一个对象。虽然这在此处是必要的，但如果出于某种原因在其他代码中您想要制作一个可变对象的副本并在不修改原始对象的情况下对副本进行操作，您需要import copy 并使用copy.deepcopy()。

第二个，defaultdict() 构造函数期望，作为它的第一个参数，一个不接受任何参数并返回默认值的可调用对象。但是，对于这种情况，该返回值需要是另一个 defaultdict，其中一个可调用返回另一个 defaultdict，一个可调用返回另一个......等等。这是无限递归。

因此没有与此代码等效的 defaultdict。相反，带有 setdefault 和普通 dicts 的原始版本可能是最好和最 Pythonic 的方式。

【讨论】：

【解决方案2】：

setdefault 在字典中的工作原理

# case 1
d = {}
temp = d.setdefault("A")
print "d = ", d
print "temp = ", temp
print "id of d = ", id(d), "id of temp = ", id(temp)
# output
d = {'A': None}
temp = None
id of d = 140584110017624, id of temp = 9545840 # memory locations of d, temp

# case 2
d = {}
temp = d.setdefault("A", "default Value")
print "d = ", d
print "temp = ", temp
print "id of d = ", id(d), "id of temp = ", id(temp)
# output
d = {'A': "default Value"}
temp = "default Value"
id of d = 140584110017624, id of temp = 9545840 # memory locations of d, temp

我的代码t=d 表示 t 和 d 的内存位置相同。
所以，当代码t = t.setdefault(char,{})首先执行t.setdefault(char,{})执行并更改t的内存位置中的内容时，然后它返回内容，然后将新的内存位置分配给名称t并将返回的值分配给它. t 和 d 的内存位置相同，这就是 d 受到影响的原因。

【讨论】：

【解决方案3】：

我将一步一步地遍历循环，并解释它如何继续分配嵌套的字典：

name = 'foo'
    t = d  # both t and d point to the same empty dict object
    char = 'f'
        t = t.setdefault(char,{})  
        # the first thing evaluated is the right hand side:
        # now d['f'] = {}, since that key wasn't in the dict
        # t points to the same object here
        # now the result of the left side (a new empty dict) is assigned to `t`.
        # this empty dict is also the exact *same* object referenced by d['f'] as well though!
        # so at this point d['f'] = {}, and t = {}, and both those dicts are the same!
    char = 'o'
        t = t.setdefault(char,{})  
        # eval the right side again, so now t['o'] = {}, but remember d['f'] == t
        # so really d['f'] = {'o':{}}
        # and again we assign the result of the right side to a brand new `t`
        # so now d['f']['o'] = {}, and t['o'] = {}, and these empty dicts are 
        # again referencing the same object
    char = 'o'
        t = t.setdefault(char,{})  
        # our `t` from last time is empty, so it gets assigned the same as before
        # and now d['f']['o']['o'] = {}
name = 'bar'
    t = d  # re-start this, but with d['f']['o']['o'] = {}
    char = 'b'
    #...everything proceeds as before - since 'b' is not in `d`, 
    # we start generating nested dicts again
    # ...
...
name = 'bars'
    # main difference here is that d['b']['a']['r'] exists, 
    # so we end up just adding the 's':{} to the end

至于 defaultdict 等效项，这有点棘手。问题是你需要defaultdict的一路向下

我找到了一种方法，用一个小函数here

from collections import defaultdict

def fix(f):
    return lambda *args, **kwargs: f(fix(f), *args, **kwargs)

d1 = fix(defaultdict)()

for name in ['foo', 'bar', 'bars']:
    t1 = d1
    for char in name:
        t1 = t1[char]

print d1

【讨论】：

很好的解释！我不能说更多。谢谢。
太棒了！感谢您的详细介绍。我将此标记为答案。
@VikashRajaSamuelSelvin:。谢谢。我也回答了你问题的第二部分。这有点棘手，但工作原理相同。
@Gerrat 谢谢，我正要发布第二部分。一个简单的 d1 = defaultdict(defaultdict) 不起作用，我根据您的回答猜测它与 Python 如何解包争论有关。我注意到的是，在第二次调用中，t1 从类型 defaultdict 更改为 defaultdict。感谢您也澄清了这一点，但由于我的声誉，无法投票。
@VikashRajaSamuelSelvin：它与解包方式无关 - 但你在 defaultdict(defaultdict) 的正确轨道上......问题是它需要更像 defaultdict(defaultdict( defaultdict(... 一路向下！