【问题标题】:Why is the size of an empty dict same as that of a non empty dict in Python?为什么空字典的大小与 Python 中非空字典的大小相同?
【发布时间】:2013-09-04 17:10:19
【问题描述】:

这可能是微不足道的,但我不确定我是否理解,我尝试谷歌搜索但没有找到令人信服的答案。

>>> sys.getsizeof({})
140
>>> sys.getsizeof({'Hello':'World'})
140
>>>
>>> yet_another_dict = {}
>>> for i in xrange(5000):
        yet_another_dict[i] = i**2

>>> 
>>> sys.getsizeof(yet_another_dict)
98444

我怎么理解这个? 为什么空字典的大小与非空字典的大小相同?

【问题讨论】:

标签: python memory python-2.7 dictionary


【解决方案1】:

有两个原因:

  1. 字典只保存对对象的引用,而不是对象本身,因此它的大小与其包含的对象的大小无关,而是与字典包含的引用(项目)的数量有关。

  2. 更重要的是,字典会为块中的引用预先分配内存。因此,当您创建字典时,它已经为第一个 n 引用预分配了内存。当它填满内存时,它会预先分配一个新块。

您可以观察该行为,运行下一段代码。

d = {}
size = sys.getsizeof(d)
print size
i = 0
j = 0
while i < 3:
    d[j] = j
    j += 1
    new_size = sys.getsizeof(d)
    if size != new_size:
        print new_size
        size = new_size
        i += 1

打印出来的:

280
1048
3352
12568

在我的机器上,但这取决于架构(32 位、64 位)。

【讨论】:

    【解决方案2】:

    CPython 中的字典直接在字典对象本身中分配少量键空间(4-8 个条目,具体取决于版本和编译选项)。来自dictobject.h

    /* PyDict_MINSIZE is the minimum size of a dictionary.  This many slots are
     * allocated directly in the dict object (in the ma_smalltable member).
     * It must be a power of 2, and at least 4.  8 allows dicts with no more
     * than 5 active entries to live in ma_smalltable (and so avoid an
     * additional malloc); instrumentation suggested this suffices for the
     * majority of dicts (consisting mostly of usually-small instance dicts and
     * usually-small dicts created to pass keyword arguments).
     */
    #ifndef Py_LIMITED_API
    #define PyDict_MINSIZE 8
    

    请注意,CPython 还会批量调整字典的大小,以避免频繁重新分配不断增长的字典。来自dictobject.c

    /* If we added a key, we can safely resize.  Otherwise just return!
     * If fill >= 2/3 size, adjust size.  Normally, this doubles or
     * quaduples the size, but it's also possible for the dict to shrink
     * (if ma_fill is much larger than ma_used, meaning a lot of dict
     * keys have been * deleted).
     *
     * Quadrupling the size improves average dictionary sparseness
     * (reducing collisions) at the cost of some memory and iteration
     * speed (which loops over every possible entry).  It also halves
     * the number of expensive resize operations in a growing dictionary.
     *
     * Very large dictionaries (over 50K items) use doubling instead.
     * This may help applications with severe memory constraints.
     */
    if (!(mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2))
        return 0;
    return dictresize(mp, (mp->ma_used > 50000 ? 2 : 4) * mp->ma_used);
    

    【讨论】:

      猜你喜欢
      • 2014-06-25
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-07-17
      • 2015-11-10
      • 1970-01-01
      • 2015-07-23
      • 1970-01-01
      相关资源
      最近更新 更多