为什么空字典的大小与 Python 中非空字典的大小相同？答案

【问题标题】：Why is the size of an empty dict same as that of a non empty dict in Python?为什么空字典的大小与 Python 中非空字典的大小相同？
【发布时间】：2013-09-04 17:10:19
【问题描述】：

这可能是微不足道的，但我不确定我是否理解，我尝试谷歌搜索但没有找到令人信服的答案。

>>> sys.getsizeof({})
140
>>> sys.getsizeof({'Hello':'World'})
140
>>>
>>> yet_another_dict = {}
>>> for i in xrange(5000):
        yet_another_dict[i] = i**2

>>> 
>>> sys.getsizeof(yet_another_dict)
98444

我怎么理解这个？为什么空字典的大小与非空字典的大小相同？

【问题讨论】：

必看视频：The mighty dictionary

标签： python memory python-2.7 dictionary

【解决方案1】：

有两个原因：

字典只保存对对象的引用，而不是对象本身，因此它的大小与其包含的对象的大小无关，而是与字典包含的引用（项目）的数量有关。
更重要的是，字典会为块中的引用预先分配内存。因此，当您创建字典时，它已经为第一个 n 引用预分配了内存。当它填满内存时，它会预先分配一个新块。

您可以观察该行为，运行下一段代码。

d = {}
size = sys.getsizeof(d)
print size
i = 0
j = 0
while i < 3:
    d[j] = j
    j += 1
    new_size = sys.getsizeof(d)
    if size != new_size:
        print new_size
        size = new_size
        i += 1

打印出来的：

在我的机器上，但这取决于架构（32 位、64 位）。

【讨论】：

【解决方案2】：

CPython 中的字典直接在字典对象本身中分配少量键空间（4-8 个条目，具体取决于版本和编译选项）。来自dictobject.h：

/* PyDict_MINSIZE is the minimum size of a dictionary.  This many slots are
 * allocated directly in the dict object (in the ma_smalltable member).
 * It must be a power of 2, and at least 4.  8 allows dicts with no more
 * than 5 active entries to live in ma_smalltable (and so avoid an
 * additional malloc); instrumentation suggested this suffices for the
 * majority of dicts (consisting mostly of usually-small instance dicts and
 * usually-small dicts created to pass keyword arguments).
 */
#ifndef Py_LIMITED_API
#define PyDict_MINSIZE 8

请注意，CPython 还会批量调整字典的大小，以避免频繁重新分配不断增长的字典。来自dictobject.c：

/* If we added a key, we can safely resize.  Otherwise just return!
 * If fill >= 2/3 size, adjust size.  Normally, this doubles or
 * quaduples the size, but it's also possible for the dict to shrink
 * (if ma_fill is much larger than ma_used, meaning a lot of dict
 * keys have been * deleted).
 *
 * Quadrupling the size improves average dictionary sparseness
 * (reducing collisions) at the cost of some memory and iteration
 * speed (which loops over every possible entry).  It also halves
 * the number of expensive resize operations in a growing dictionary.
 *
 * Very large dictionaries (over 50K items) use doubling instead.
 * This may help applications with severe memory constraints.
 */
if (!(mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2))
    return 0;
return dictresize(mp, (mp->ma_used > 50000 ? 2 : 4) * mp->ma_used);

【讨论】：