使用元组作为字典键答案

【问题标题】：Using tuples as dictionary keys使用元组作为字典键
【发布时间】：2020-09-25 11:42:24
【问题描述】：

下面的注释代码块输出我想要的答案，而未注释的代码块输出错误的答案。

有人能解释一下为什么这两个代码块不同吗？ self.q 的键应该是 (state, action) 对，那么 self.q[state][action] 为什么会起作用呢？ self.q 不应该只接受一个键吗？

    def update_q_value(self, state, action, old_q, reward, future_rewards):
        # Q-values are stored in the dictionary self.q. The keys of self.q should be in the form of (state, action) pairs, where state is a tuple of all piles sizes in order, and action is a tuple (i, j) representing a pile and a number.

        state_pair = (tuple(state), action)
        if state_pair not in self.q:
            self.q[state_pair] = dict()

        print(old_q + self.alpha * (reward + future_rewards - old_q))

        self.q[state_pair] = old_q + self.alpha * (reward + future_rewards - old_q)

        # state = tuple(state)
        # if state not in self.q:
        #     self.q[state] = dict()

        # print(old_q + self.alpha * (reward + future_rewards - old_q))

        # self.q[state][action] = old_q + self.alpha * (reward + future_rewards - old_q)

第一个块的输出如下所示：

Playing training game 1
0.0
0.0
0.0
0.0
0.0
0.0
0.0
-0.5
0.5
Playing training game 2
0.0
0.0
0.0
0.0
0.0
0.0
0.0
-0.75
0.75
...
Playing training game 9999
0.0
0.0
0.0
0.0
0.0
0.0
0.0
-1.0
1.0
Playing training game 10000
0.0
0.0
0.0
0.0
0.0
0.0
0.0
-1.0
1.0

第二个块的输出如下所示：

Playing training game 1
0.0
0.0
0.0
0.0
0.0
0.0
-0.5
0.5
Playing training game 2
0.0
0.0
0.0
0.0
0.0
0.0
-0.25
-0.5
0.5
...
Playing training game 9999
0.0625
0.125
0.125
0.125
0.25
0.25
-0.25
-0.5
0.5
Playing training game 10000
0.0625
0.125
0.125
0.125
0.25
0.25
-0.25
-0.5
0.5

如果有人愿意看的话，完整的代码在这里：https://d.pr/n/MKE8iH 它可以用类似的东西运行：

ai = train(10000)
play(ai)

【问题讨论】：

您能否添加一些如何第二个实现给出错误答案的示例？即两个不同函数的输入和输出值。
self.q[state_pair] 将元组作为键。 self.q[state][action] 是字典中的字典
@MatsLindh 编辑了帖子以阐明第二个实现如何给出错误的答案。它是玩 Nim 的 ML 算法的一部分。
@MauriceMeyer 这很奇怪，因为当我尝试访问它时，比如通过print(self.q[state][action])，它给了我一个动作值的 KeyError。
@JasonC 抱歉，您的示例没有显示任何有助于调试实际函数调用的有用信息；将其简化为仅显示三组输入以及函数本身的正确和错误输入。我们不可能对大型应用程序的幕后发生的事情发表任何看法。将您的问题缩小到您已发布的行。

标签： python dictionary indexing tuples

【解决方案1】：

如 cmets 中所述，self.q[state][action] 之所以有效，是因为您正在创建另一个字典作为值，其中 action 作为键。

class foo():
    def __init__(self):
        self.qTuple = {}
        self.qDict = {}

    def update_q_value_tuple(self, state, action, value):
        state_pair = (tuple(state), action)
        if state_pair not in self.qTuple:
            self.qTuple[state_pair] = dict()
        self.qTuple[state_pair] = value


    def update_q_value_dict(self, state, action, value):
        state = tuple(state)
        if state not in self.qDict:
            self.qDict[state] = dict()
        self.qDict[state][action] = value


f = foo()
states = ['foo', 'bar']
actions = ['hold', 'release']

for s in states:
    for a in actions:
        for v in range(0, 5):
            f.update_q_value_tuple(s, a, v)
            f.update_q_value_dict(s, a, v)

print f.qTuple
print f.qDict

输出：

{(('f', 'o', 'o'), 'hold'): 4, (('b', 'a', 'r'), 'hold'): 4, (('b', 'a', 'r'), 'release'): 4, (('f', 'o', 'o'), 'release'): 4}
{('f', 'o', 'o'): {'release': 4, 'hold': 4}, ('b', 'a', 'r'): {'release': 4, 'hold': 4}}

注意，创建一个元素的元组时需要小心，不要忘记尾随逗号：state = tuple(state, )

【讨论】：