为什么从串联列表创建集合比使用 .update 更快？答案

【问题标题】：Why is creating a set from a concatenated list faster than using `.update`?为什么从串联列表创建集合比使用 .update 更快？
【发布时间】：2015-12-05 15:47:37
【问题描述】：

在尝试回答 What is the preferred way to compose a set from multiple lists in Python 时，我做了一些性能分析，得出了一个有点令人惊讶的结论。

使用

python -m timeit -s '
import itertools
import random
n=1000000
random.seed(0)
A = [random.randrange(1<<30) for _ in xrange(n)]
B = [random.randrange(1<<30) for _ in xrange(n)]
C = [random.randrange(1<<30) for _ in xrange(n)]'

为了设置，我对以下 sn-ps 进行了计时：

> $TIMEIT 'set(A+B+C)'
10 loops, best of 3: 872 msec per loop

> $TIMEIT 's = set(A); s.update(B); s.update(C)'
10 loops, best of 3: 930 msec per loop

> $TIMEIT 's = set(itertools.chain(A,B,C))'
10 loops, best of 3: 941 msec per loop

令我惊讶的是，set(A+B+C) 是最快的，尽管它创建了一个包含 3000000 个元素的中间列表。 .update 和 itertools.chain 都比较慢，尽管它们都没有复制任何列表。

这是怎么回事？

编辑：在第二台机器（OS X 10.10.5、Python 2.7.10、2.5GHz Core i7）上，我运行了以下脚本（它前后运行测试以避免排序影响）：

SETUP='import itertools
import random
n=1000000
random.seed(0)
A = [random.randrange(1<<30) for _ in xrange(n)]
B = [random.randrange(1<<30) for _ in xrange(n)]
C = [random.randrange(1<<30) for _ in xrange(n)]'

python -m timeit -s "$SETUP" 'set(A+B+C)'
python -m timeit -s "$SETUP" 's = set(A); s.update(B); s.update(C)'
python -m timeit -s "$SETUP" 's = set(itertools.chain(A,B,C))'

python -m timeit -s "$SETUP" 's = set(itertools.chain(A,B,C))'
python -m timeit -s "$SETUP" 's = set(A); s.update(B); s.update(C)'
python -m timeit -s "$SETUP" 'set(A+B+C)'

并得到以下结果：

10 loops, best of 3: 579 msec per loop
10 loops, best of 3: 726 msec per loop
10 loops, best of 3: 775 msec per loop
10 loops, best of 3: 761 msec per loop
10 loops, best of 3: 737 msec per loop
10 loops, best of 3: 555 msec per loop

现在set(A+B+C)明显更快了，而且结果相当稳定 - 很难将其归结为单纯的测量误差。重复运行此脚本会产生类似的结果。

【问题讨论】：

我能做出的唯一猜测是，第一种情况传入了一个已知长度的列表，因此集合构造可能更明智地选择初始的底层内存需求，而不是其他情况两个，其中集合被创建并调整大小两次（第二种情况）或使用迭代器创建，它可能在内部多次调整大小。
除非他们更改了set_init，否则它似乎不是这样工作的。 set_init 只是直接调用 set_update_internal，它只是循环遍历元素。（我会从 hg.python.org 撤出，但该服务器目前似乎已关闭）
相关：Combining two sorted lists in Python
无法在 OS X 上的 Python 2.7 上重现；所有三个测试都显示出相当多的变化，并且没有一个是明显的赢家。只需 10 次重复和很长的运行时间（10 次测试约 8 秒），您就可以捕捉到很多噪音。
当我将 n 降低到 1000 并重复 10k 次时，set.update() 版本相当一致地获胜。

标签： python performance optimization set

【解决方案1】：

在我的 Win 7 SP1 机器上，我得到的结果与你不同，并不令人惊讶，它具有与 Python 2.7.10 类似的处理器，其中set(A+B+C) 似乎是最慢的方法正如人们所预料的那样。重新启用垃圾收集和使用 Python 3.4.3 获得了类似的结果。

我使用了自己基于timeit的性能评估测试平台，得到了以下结果：

fastest to slowest execution speeds (Python 2.7.10)
   (10 executions, best of 3 repetitions)

set(A); s.update(B); s.update(C) :  4.787919 secs, rel speed 1.00x,  0.00% slower
              set(A).update(B,C) :  6.463666 secs, rel speed 1.35x, 35.00% slower
     set(itertools.chain(A,B,C)) :  6.743028 secs, rel speed 1.41x, 40.83% slower
                      set(A+B+C) :  8.030483 secs, rel speed 1.68x, 67.72% slower

基准代码：

from __future__ import print_function
import sys
from textwrap import dedent
import timeit

N = 10  # Number of executions of each "algorithm"
R = 3  # number of Repeations of executions

# common setup for all algorithms (not timed)
setup = dedent("""
    import itertools
    import gc
    import random

    try:
        xrange
    except NameError:
        xrange = range

    random.seed(0)
    n = 1000000  # number of elements in each list
    A = [random.randrange(1<<30) for _ in xrange(n)]
    B = [random.randrange(1<<30) for _ in xrange(n)]
    C = [random.randrange(1<<30) for _ in xrange(n)]

    # gc.enable()  # to (re)enable garbage collection if desired
""")

algorithms = {
    "set(A+B+C)": dedent("""
        s = set(A+B+C)
    """),

    "set(A); s.update(B); s.update(C)": dedent("""
        s = set(A); s.update(B); s.update(C)
    """),

    "set(itertools.chain(A,B,C))": dedent("""
        s = set(itertools.chain(A,B,C))
        """),

    "set(A).update(B,C)": dedent("""
        s = set(A).update(B,C)
        """),
}

# execute and time algorithms, collecting results
timings = [
    (label,
     min(timeit.repeat(algorithms[label], setup=setup, repeat=R, number=N)),
    ) for label in algorithms
]

print('fastest to slowest execution speeds (Python {}.{}.{})\n'.format(
        *sys.version_info[:3]),
        '  ({:,d} executions, best of {:d} repetitions)\n'.format(N, R))

longest = max(len(timing[0]) for timing in timings)  # length of longest label
ranked = sorted(timings, key=lambda t: t[1])  # ascending sort by execution time
fastest = ranked[0][1]
for timing in ranked:
    print("{:>{width}} : {:9.6f} secs, rel speed {:4.2f}x, {:6.2f}% slower".
            format(timing[0], timing[1], round(timing[1]/fastest, 2),
                   round((timing[1]/fastest - 1) * 100, 2), width=longest))

【讨论】：