【问题标题】：Speeding up list comprehension加快列表理解
【发布时间】：2019-08-22 13:28:53
【问题描述】：

我正在尝试创建一个循环遍历数组并创建新数组的函数。 Usint timeit 我发现最慢的部分是循环 numpy 数组。由于我用作输入的数组往往很长，因此我想尽可能加快速度。

有没有办法让列表理解循环更快？我提供了一个函数来重新创建我的问题：

def get_days(year, month):
    months=np.array([31,28,31,30,31,30,31,31,30,31,30,31])
    if month==2:
        if (year%4==0 and year%100!=0) or (year%400==0):
            return 29
    return months[month-1]

这个数组需要产生更好的性能：

res=np.arange(20788, 20940)
np.array([np.min([x+datetime.fromtimestamp(20809*24*60*60).day-1, x+get_days(datetime.fromtimestamp(20809*24*60*60).year, datetime.fromtimestamp(20809*24*60*60).month)]) for x in res])

【问题讨论】：

请注意，get_days 的每次调用都会分配一个新的months 数组。将其移到 get_days 函数之外。还要检查对months 使用常规数组而不是 numpy 数组是否有影响。
是的，它提高了一点循环速度。但是我想知道是否有某种映射或应用函数可以同时在列表上应用 np.min。
调用datetime.fromtimestamp 一次而不是三次也会有所帮助。
如果你唯一要做的就是索引一个列表，那么没有理由使用numpy。
最大的优化就是去掉get_days，内联代码；在 Python 中调用用户定义的函数相对昂贵。

标签： python performance numpy

【解决方案1】：

使用 numpy 函数和向量化，而不是使用带有循环的列表推导。

b = np.array([np.min([x+datetime.fromtimestamp(20809*24*60*60).day-1, 
                      x+get_days(datetime.fromtimestamp(20809*24*60*60).year,
                                 datetime.fromtimestamp(20809*24*60*60).month)]) 
             for x in res])

c = np.minimum(res+datetime.fromtimestamp(20809*24*60*60).day-1,
               res+get_days(datetime.fromtimestamp(20809*24*60*60).year,
                            datetime.fromtimestamp(20809*24*60*60).month))

b == c

输出：

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True])

时间

%timeit b = np.array([np.min([x+datetime.fromtimestamp(20809*24*60*60).day-1, x+get_days(datetime.fromtimestamp(20809*24*60*60).year, datetime.fromtimestamp(20809*24*60*60).month)]) for x in res])

每个循环 1.99 ms ± 33.4 µs（平均值 ± 标准偏差，7 次运行，每次 100 个循环）

%timeit c = np.minimum(res+datetime.fromtimestamp(20809*24*60*60).day-1, res+get_days(datetime.fromtimestamp(20809*24*60*60).year, datetime.fromtimestamp(20809*24*60*60).month))

每个循环 10.5 µs ± 310 ns（平均值 ± 标准偏差，7 次运行，每次 100000 次循环）

【讨论】：

【解决方案2】：

正如@botje 评论的那样。请注意，每当您在列表理解中调用该函数时，都会分配一些变量。当我在函数之外声明这些变量时，我设法让它更快。我的代码如下所示：

import numpy as np
from datetime import datetime
from helpers.time_dec import calc_execution_time

months=np.array([31,28,31,30,31,30,31,31,30,31,30,31])
dt = datetime.fromtimestamp(20809 * 24 * 60 * 60)
dt_day = dt.day

def get_days(year, month):

    if month==2:
        if (year%4==0 and year%100!=0) or (year%400==0):
            return 29
    return months[month-1]

d =  get_days(dt.year, dt.month)


@calc_execution_time
def calc():
    res = np.arange(20788, 20940)

    r = np.array([np.min([x + dt_day - 1,
                      x +d]) for x in res])
    return r


print(calc()) # 0.0011 seconds, and your code showed 0.0026 seconds. So obviously the Performance is better now

################### this is the test exectution time function ###############
from timeit import default_timer



def calc_execution_time(func):

    """calculate execution Time of a function"""



    def wrapper(*args, **kwargs):

        before = default_timer()
        res = func(*args, **kwargs)
        after = default_timer()
        execution_time = after - before
        print(f"execution time of the Function {func.__qualname__} is :=> {execution_time} seconds")
        return res

    return wrapper

您还可以使用地图功能。我不是你的目标，但我认为你可以改变你的函数来使用 map 而不是列表理解，它会返回一个生成器对象，所以代码看起来像这样：

@calc_execution_time
def calc():
    res = np.arange(20788, 20940)

    #r = np.array([np.min([x + dt_day - 1, x +d]) for x in res])

    r = map(lambda x: np.min([x + dt_day - 1, x +d]), res)
    return r


print(list(calc()))   # 1.65 e-05 seconds

【讨论】：