用 numpy 支持覆盖 dict答案

【问题标题】：Override a dict with numpy support用 numpy 支持覆盖 dict
【发布时间】：2016-08-08 09:59:55
【问题描述】：

使用How to "perfectly" override a dict? 的基本思想，我编写了一个基于字典的类，它应该支持分配点分隔键，即Extendeddict('level1.level2', 'value') == {'level1':{'level2':'value'}}

代码是

import collections
import numpy

class Extendeddict(collections.MutableMapping):
    """Dictionary overload class that adds functions to support chained keys, e.g. A.B.C          
    :rtype : Extendeddict
    """
    # noinspection PyMissingConstructor
    def __init__(self, *args, **kwargs):
        self._store = dict()
        self.update(dict(*args, **kwargs))

    def __getitem__(self, key):
        keys = self._keytransform(key)
        print 'Original key: {0}\nTransformed keys: {1}'.format(key, keys)
        if len(keys) == 1:
            return self._store[key]
        else:
            key1 = '.'.join(keys[1:])
            if keys[0] in self._store:
                subdict = Extendeddict(self[keys[0]] or {})
                try:
                    return subdict[key1]
                except:
                    raise KeyError(key)
            else:
                raise KeyError(key)

    def __setitem__(self, key, value):
        keys = self._keytransform(key)
        if len(keys) == 1:
            self._store[key] = value
        else:
            key1 = '.'.join(keys[1:])
            subdict = Extendeddict(self.get(keys[0]) or {})
            subdict.update({key1: value})
            self._store[keys[0]] = subdict._store

    def __delitem__(self, key):
        keys = self._keytransform(key)
        if len(keys) == 1:
            del self._store[key]
        else:
            key1 = '.'.join(keys[1:])
            del self._store[keys[0]][key1]
            if not self._store[keys[0]]:
                del self._store[keys[0]]

    def __iter__(self):
        return iter(self._store)

    def __len__(self):
        return len(self._store)

    def __repr__(self):
        return self._store.__repr__()

    # noinspection PyMethodMayBeStatic
    def _keytransform(self, key):
        try:
            return key.split('.')
        except:
            return [key]

但是使用 Python 2.7.10 和 numpy 1.11.0，正在运行

basic = {'Test.field': 'test'}
print 'Normal dictionary: {0}'.format(basic)
print 'Normal dictionary in a list: {0}'.format([basic])
print 'Normal dictionary in numpy array: {0}'.format(numpy.array([basic], dtype=object))
print 'Normal dictionary in numpy array.tolist(): {0}'.format(numpy.array([basic], dtype=object).tolist())

extended_dict = Extendeddict(basic)
print 'Extended dictionary: {0}'.format(extended_dict)
print 'Extended dictionary in a list: {0}'.format([extended_dict])
print 'Extended dictionary in numpy array: {0}'.format(numpy.array([extended_dict], dtype=object))
print 'Extended dictionary in numpy array.tolist(): {0}'.format(numpy.array([extended_dict], dtype=object).tolist())

我明白了：

Normal dictionary: {'Test.field': 'test'}
Normal dictionary in a list: [{'Test.field': 'test'}]
Normal dictionary in numpy array: [{'Test.field': 'test'}]
Normal dictionary in numpy array.tolist(): [{'Test.field': 'test'}]
Original key: Test
Transformed keys: ['Test']
Extended dictionary: {'Test': {'field': 'test'}}
Extended dictionary in a list: [{'Test': {'field': 'test'}}]
Original key: 0
Transformed keys: [0]
Traceback (most recent call last):
  File "/tmp/scratch_2.py", line 77, in <module>
    print 'Extended dictionary in numpy array: {0}'.format(numpy.array([extended_dict], dtype=object))
  File "/tmp/scratch_2.py", line 20, in __getitem__
    return self._store[key]
KeyError: 0

而我希望 print 'Extended dictionary in numpy array: {0}'.format(numpy.array([extended_dict], dtype=object)) 会产生 Extended dictionary in numpy array: [{'Test': {'field': 'test'}}]

关于这可能有什么问题有什么建议吗？这甚至是正确的方法吗？

【问题讨论】：

在我看来，您正在尝试重塑 pandas 库；)
@MaxU Pandas 做了一些与我需要的完全不同的事情，我确实将它用于许多其他事情。我想要的是一个支持点分隔字段的“简单”类字典。
添加一些调试打印，例如错误附近的key 和keys。
我会使用pdb 来检查出了什么问题。
列表中的对象会发生什么？或thearrary.tolist()。如果我运行你的代码，我会尝试各种打印和操作，试图找到一种模式。

标签： python numpy inheritance dictionary

【解决方案1】：

Numpy 尝试做它应该做的事情：

Numpy 检查每个元素是否可迭代（通过使用len 和iter），因为您传入的内容可能会被解释为多维数组。

这里有个问题：dict-like 类（意思是isinstance(element, dict) == True）不会被解释为另一个维度（这就是为什么传入[{}] 有效）。可能他们应该检查它是否是collections.Mapping 而不是dict。也许你可以在他们的issue tracker 上提交一个错误。

如果您将类定义更改为：

class Extendeddict(collections.MutableMapping, dict):
     ...

或更改您的__len__-方法：

    def __len__(self):
        raise NotImplementedError

它有效。这些都不是你想做的事情，但 numpy 只是使用 duck typing 来创建数组，而不是直接从 dict 子类化或通过使 len 无法访问 numpy 将你的类视为应该是另一个维度。如果您想传入自定义序列（来自collections.Sequence 的子类），但对于collections.Mapping 或collections.MutableMapping 来说不方便，这是相当聪明和方便的。我认为这是一个错误。

【讨论】：

我确实尝试从 dict 继承，但这会导致一堆其他问题，我无法弄清楚如何正确解决，但是，是的，我也认为这可能是一个错误numpy 本身。
@NicolauGonçalves 我不想推荐从dict 继承。这只是为了说明我为什么得出这个结论。
正如我在对另一个答案的评论中提到的，如果有人使用这个类，不定义长度会适得其反。但我会在 numpy 中创建一个问题，看看开发人员的想法。

【解决方案2】：

问题出在np.array 构造函数步骤中。它深入研究其输入，试图创建一个更高维的数组。

In [99]: basic={'test.field':'test'}

In [100]: eb=Extendeddict(basic)

In [104]: eba=np.array([eb],object)
<keys: 0,[0]>
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-104-5591a58c168a> in <module>()
----> 1 eba=np.array([eb],object)

<ipython-input-88-a7d937b1c8fd> in __getitem__(self, key)
     11         keys = self._keytransform(key);print key;print keys
     12         if len(keys) == 1:
---> 13             return self._store[key]
     14         else:
     15             key1 = '.'.join(keys[1:])

KeyError: 0

但是，如果我创建一个数组并分配对象，它就可以正常工作

In [105]: eba=np.zeros((1,),object)

In [106]: eba[0]=eb

In [107]: eba
Out[107]: array([{'test': {'field': 'test'}}], dtype=object)

np.array 是一个与dtype=object 一起使用的棘手函数。比较 np.array([[1,2],[2,3]],dtype=object) 和 np.array([[1,2],[2]],dtype=object)。一个是 (2,2)，另一个是 (2,)。它尝试创建一个 2d 数组，并且只有在失败时才使用 1d 列表元素。这里正在发生类似的事情。

我看到了 2 个解决方案 - 一个是关于构建数组的方法，我在其他场合使用过。另一个是弄清楚为什么np.array 没有深入研究dict 而是用你的。 np.array 已编译，因此可能需要阅读严格的 GITHUB 代码。

我尝试了f=np.frompyfunc(lambda x:x,1,1) 的解决方案，但这不起作用（有关详细信息，请参阅我的编辑历史记录）。但我发现将Extendeddict 与dict 混合确实有效：

In [139]: np.array([eb,basic])
Out[139]: array([{'test': {'field': 'test'}}, {'test.field': 'test'}], dtype=object)

将它与None 或空列表等其他内容混合也是如此

In [140]: np.array([eb,[]])
Out[140]: array([{'test': {'field': 'test'}}, []], dtype=object)

In [142]: np.array([eb,None])[:-1]
Out[142]: array([{'test': {'field': 'test'}}], dtype=object)

这是构造列表对象数组的另一个常见技巧。

如果你给它两个或多个不同长度的Extendeddict，它也可以工作

np.array([eb, Extendeddict({})])。换句话说，如果 len(...) 不同（就像混合列表一样）。

【讨论】：

不幸的是，如果我删除 dtype 参数，也会发生同样的情况。 :(
问题不在于dtype=object。我认为它甚至在查看dtype 之前就分析了输入。从它的行为来看，我认为在实际构建结果时只查看接近末尾的dtype。
我确实尝试了与您相同的操作，添加了一个不同长度的对象，正如您所描述的那样。但这也意味着使用这个库的每个人都需要意识到这个问题，这对我来说似乎适得其反。我暂时保持原样，但如果其他人遇到同样的问题，我会支持你的答案。