与 python 代码相比，库 itertools 的性能答案

【问题标题】：Performance of library itertools compared to python code与 python 代码相比，库 itertools 的性能
【发布时间】：2013-03-06 19:38:29
【问题描述】：

作为对我的问题 Find the 1 based position to which two lists are the same 的回答，我得到了使用 C 库 itertools 来加快速度的提示。

为了验证我使用 cProfile 编写了以下测试：

from itertools import takewhile, izip

def match_iter(self, other):
    return sum(1 for x in takewhile(lambda x: x[0] == x[1],
                                        izip(self, other)))

def match_loop(self, other):
    element = -1
    for element in range(min(len(self), len(other))):
        if self[element] != other[element]:
            element -= 1
            break
    return element +1

def test():
    a = [0, 1, 2, 3, 4]
    b = [0, 1, 2, 3, 4, 0]

    print("match_loop a=%s, b=%s, result=%s" % (a, b, match_loop(a, b)))
    print("match_iter a=%s, b=%s, result=%s" % (a, b, match_iter(a, b)))

    i = 10000
    while i > 0:
        i -= 1
        match_loop(a, b)
        match_iter(a, b)

def profile_test():
    import cProfile
    cProfile.run('test()')

if __name__ == '__main__':
    profile_test()

函数 match_iter() 正在使用 itertools，而函数 match_loop() 是我在使用普通 python 之前实现的。

函数 test() 定义了两个列表，打印带有两个函数结果的列表以验证它是否正常工作。两个结果都具有预期值 5，即列表的长度相等。然后它在这两个函数上循环 10,000 次。

最后，使用 profile_test() 分析了整个事情。

我学到的是 izip 没有在 python3 的 itertools 中实现，至少在我使用的 debian wheezy whitch 中没有实现。所以我用python2.7运行了测试

结果如下：

python2.7 match_test.py
match_loop a=[0, 1, 2, 3, 4], b=[0, 1, 2, 3, 4, 0], result=5
match_iter a=[0, 1, 2, 3, 4], b=[0, 1, 2, 3, 4, 0], result=5
         180021 function calls in 0.636 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.636    0.636 <string>:1(<module>)
        1    0.039    0.039    0.636    0.636 match_test.py:15(test)
    10001    0.048    0.000    0.434    0.000 match_test.py:3(match_iter)
    60006    0.188    0.000    0.275    0.000 match_test.py:4(<genexpr>)
    50005    0.087    0.000    0.087    0.000 match_test.py:4(<lambda>)
    10001    0.099    0.000    0.162    0.000 match_test.py:7(match_loop)
    20002    0.028    0.000    0.028    0.000 {len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
    10001    0.018    0.000    0.018    0.000 {min}
    10001    0.018    0.000    0.018    0.000 {range}
    10001    0.111    0.000    0.387    0.000 {sum}

让我想知道的是，看看 cumtime 值，我的普通 python 版本对于 10,000 次循环的值是 0.162 秒，而 match_iter 版本需要 0.434 秒。

一方面，python 非常快，很棒，所以我不必担心。但这是否正确，C 库完成工作所需的时间是简单 python 代码的两倍多？还是我犯了一个致命的错误？

为了验证我也用 python2.6 运行了测试，这似乎更快，但循环和 itertools 之间的区别相同。

谁有经验并愿意提供帮助？

【问题讨论】：

izip() 在 3.x 中不存在，因为 zip() 不像在 2.x 中那样起作用，而是像 izip() 那样起作用，因此不需要重复。
用更长的列表（例如 1000 个元素）再试一次。
另外说明，the timeit module 比cProfile 更适合这种测试。
是的，您正确解释了结果。 match_loop 更快（即使列表更长）。

标签： python performance itertools cprofile

【解决方案1】：

我认为这里的问题是您的测试列表很小 - 这意味着任何差异都可能很小，并且创建迭代器的成本超过了它们所带来的收益。

在更大的测试中（性能更重要），使用sum() 的版本可能会优于其他版本。

此外，还有样式问题 - 手动版本较长，并且依赖于按索引进行迭代，因此灵活性也较差。

我认为最易读的解决方案是这样的：

def while_equal(seq, other):
    for this, that in zip(seq, other):
        if this != that:
            return
        yield this

def match(seq, other):
    return sum(1 for _ in while_equal(seq, other))

有趣的是，在我的系统上有一个稍微修改过的版本：

def while_equal(seq, other):
    for this, that in zip(seq, other):
        if this != that:
            return
        yield 1

def match(seq, other):
    return sum(while_equal(seq, other))

比纯循环版本表现更好：

a = [0, 1, 2, 3, 4]
b = [0, 1, 2, 3, 4, 0]

import timeit

print(timeit.timeit('match_loop(a,b)', 'from __main__ import a, b, match_loop'))
print(timeit.timeit('match(a,b)', 'from __main__ import match, a, b'))

给予：

1.3171300539979711
1.291257290984504

也就是说，如果我们将纯循环版本改进为更加 Pythonic：

def match_loop(seq, other):
    count = 0
    for this, that in zip(seq, other):
        if this != that:
            return count
        count += 1
    return count

这一次（使用与上述相同的方法）对我来说是0.8548871780512854，比任何其他方法都快得多，同时仍然可读。这可能是由于原始版本中的索引循环，这通常非常慢。然而，我会选择这篇文章的第一个版本，因为我觉得它是最易读的。

【讨论】：

@Richard 您正在正确阅读它们。我建议在 3.x 中运行它们，使用 timeit 然后完全忽略结果，除非性能是一个真正的问题（例如，用你的数据测试它会导致它太慢），而是使用最易读的和灵活的解决方案。
@Richard 我也将它们发布为该问题的答案。

【解决方案2】：

首先，感谢您实际计时。
其次，可读性通常比编写快速代码更重要。如果您的代码运行速度提高了 3 倍，但您每 3 周花费 2 周时间进行调试，那不值得您花时间。
第三，您还可以使用timeit 对小段代码进行计时。我发现这种方法比使用profile 更容易一些。（profile 很适合发现瓶颈）。

itertools 通常来说相当快。但是，特别是在这种情况下，您的 takewhile 会减慢速度，因为 itertools 需要为沿途的每个元素调用一个函数。 python 中的每个函数调用都有与之相关的合理开销，因此可能会减慢您的速度（首先还有创建 lambda 函数的成本）。请注意，带有生成器表达式的sum 也会增加一点开销。但最终，在这种情况下，基本循环似乎总是获胜。

from itertools import takewhile, izip

def match_iter(self, other):
    return sum(1 for x in takewhile(lambda x: x[0] == x[1],
                                        izip(self, other)))

def match_loop(self, other):
    cmp = lambda x1,x2: x1 == x2

    for element in range(min(len(self), len(other))):
        if self[element] == other[element]:
            element += 1
        else:
            break

    return element

def match_loop_lambda(self, other):
    cmp = lambda x1,x2: x1 == x2

    for element in range(min(len(self), len(other))):
        if cmp(self[element],other[element]):
            element += 1
        else:
            break

    return element

def match_iter_nosum(self,other):
    element = 0
    for _ in takewhile(lambda x: x[0] == x[1],
                       izip(self, other)):
        element += 1
    return element

def match_iter_izip(self,other):
    element = 0
    for x1,x2 in izip(self,other):
        if x1 == x2:
            element += 1
        else:
            break
    return element



a = [0, 1, 2, 3, 4]
b = [0, 1, 2, 3, 4, 0]

import timeit

print timeit.timeit('match_iter(a,b)','from __main__ import a,b,match_iter')
print timeit.timeit('match_loop(a,b)','from __main__ import a,b,match_loop')
print timeit.timeit('match_loop_lambda(a,b)','from __main__ import a,b,match_loop_lambda')
print timeit.timeit('match_iter_nosum(a,b)','from __main__ import a,b,match_iter_nosum')
print timeit.timeit('match_iter_izip(a,b)','from __main__ import a,b,match_iter_izip')

但是请注意，最快的版本是循环+itertools 的混合体。 izip 上的这个（显式）循环也恰好更容易阅读（在我看来）。因此，我们可以由此得出结论，takewhile 是慢速部分，不一定是itertools。

【讨论】：