【发布时间】:2021-12-03 14:49:06
【问题描述】:
我看到了一个video关于python中循环的速度,其中解释说执行sum(range(N))比手动循环通过range并将变量添加在一起要快得多,因为前者在C中运行由于内置-in 正在使用的函数,而在后者中,求和是在(慢)python 中完成的。我很好奇将numpy 添加到组合中会发生什么。正如我预期的那样,np.sum(np.arange(N)) 是最快的,但 sum(np.arange(N)) 和 np.sum(range(N)) 甚至比执行简单的 for 循环还要慢。
这是为什么?
这是我用来测试的脚本,一些关于我知道的减速的假设原因(主要来自视频)和我在我的机器上得到的结果(python 3.10.0,numpy 1.21.2):
更新脚本:
import numpy as np
from timeit import timeit
N = 10_000_000
repetition = 10
def sum0(N = N):
s = 0
i = 0
while i < N: # condition is checked in python
s += i
i += 1 # both additions are done in python
return s
def sum1(N = N):
s = 0
for i in range(N): # increment in C
s += i # addition in python
return s
def sum2(N = N):
return sum(range(N)) # everything in C
def sum3(N = N):
return sum(list(range(N)))
def sum4(N = N):
return np.sum(range(N)) # very slow np.array conversion
def sum5(N = N):
# much faster np.array conversion
return np.sum(np.fromiter(range(N),dtype = int))
def sum5v2_(N = N):
# much faster np.array conversion
return np.sum(np.fromiter(range(N),dtype = np.int_))
def sum6(N = N):
# possibly slow conversion to Py_long from np.int
return sum(np.arange(N))
def sum7(N = N):
# list returns a list of np.int-s
return sum(list(np.arange(N)))
def sum7v2(N = N):
# tolist conversion to python int seems faster than the implicit conversion
# in sum(list()) (tolist returns a list of python int-s)
return sum(np.arange(N).tolist())
def sum8(N = N):
return np.sum(np.arange(N)) # everything in numpy (fortran libblas?)
def sum9(N = N):
return np.arange(N).sum() # remove dispatch overhead
def array_basic(N = N):
return np.array(range(N))
def array_dtype(N = N):
return np.array(range(N),dtype = np.int_)
def array_iter(N = N):
# np.sum's source code mentions to use fromiter to convert from generators
return np.fromiter(range(N),dtype = np.int_)
print(f"while loop: {timeit(sum0, number = repetition)}")
print(f"for loop: {timeit(sum1, number = repetition)}")
print(f"sum_range: {timeit(sum2, number = repetition)}")
print(f"sum_rangelist: {timeit(sum3, number = repetition)}")
print(f"npsum_range: {timeit(sum4, number = repetition)}")
print(f"npsum_iterrange: {timeit(sum5, number = repetition)}")
print(f"npsum_iterrangev2: {timeit(sum5, number = repetition)}")
print(f"sum_arange: {timeit(sum6, number = repetition)}")
print(f"sum_list_arange: {timeit(sum7, number = repetition)}")
print(f"sum_arange_tolist: {timeit(sum7v2, number = repetition)}")
print(f"npsum_arange: {timeit(sum8, number = repetition)}")
print(f"nparangenpsum: {timeit(sum9, number = repetition)}")
print(f"array_basic: {timeit(array_basic, number = repetition)}")
print(f"array_dtype: {timeit(array_dtype, number = repetition)}")
print(f"array_iter: {timeit(array_iter, number = repetition)}")
print(f"npsumarangeREP: {timeit(lambda : sum8(N/1000), number = 100000*repetition)}")
print(f"npsumarangeREP: {timeit(lambda : sum9(N/1000), number = 100000*repetition)}")
# Example output:
#
# while loop: 11.493371912998555
# for loop: 7.385945574002108
# sum_range: 2.4605720699983067
# sum_rangelist: 4.509678105998319
# npsum_range: 11.85120212900074
# npsum_iterrange: 4.464334709002287
# npsum_iterrangev2: 4.498494338993623
# sum_arange: 9.537815956995473
# sum_list_arange: 13.290120724996086
# sum_arange_tolist: 5.231948580003518
# npsum_arange: 0.241889145996538
# nparangenpsum: 0.21876695199898677
# array_basic: 11.736577274998126
# array_dtype: 8.71628468400013
# array_iter: 4.303306431000237
# npsumarangeREP: 21.240833958996518
# npsumarangeREP: 16.690092379001726
【问题讨论】:
-
可能是
numpy针对numpy进行了优化,而不是与内置的python 函数一起使用,就像它的设计方式一样,例如在sum(np.arange(N))的情况下numpyrange 必须首先转换为 python 数据结构,然后进行求和,与np.sum类似,也许range必须转换为numpy可以理解的东西,但 IDK -
您可以看到cpython
sumimplementation here 和numpy function here(尽管这是一个包装函数)。你可以看到disoutput for all your functions on godbolt。除了 cpython(sum和range)完全在 C 中运行之外,我看不出一个具体的原因。 -
基于您的 cmets 和
np.sum源代码中的评论,我添加了一些其他测试。我猜想在range上调用np.sum隐含地涉及转换为np.array,这似乎是非常低效的转换,除非明确告诉numpy 关于使用生成器。查看转换时间(底部三行)以及使用fromiter如何更改运行时间,这就解释了为什么np.sum(range(N))很慢。现在我唯一不明白的是为什么sum(np.arange(N))这么慢。 -
我想
sum(np.arange(N))会很慢,因为您正在创建一个 numpy 整数数组,sum将从 numpy 表示转换为Py_Long。 -
添加
sum(np.arange(N).tolist())。我猜大概是 4 个。
标签: python numpy performance