如何加快 Python 中 for 循环的执行速度？答案

【问题标题】：How to speed up execution of for loop in Python?如何加快 Python 中 for 循环的执行速度？
【发布时间】：2019-08-11 01:07:35
【问题描述】：

我编写了一个for 循环，在每次迭代时都会对列表变量进行条件语句和更新，这可能会使过程变得非常缓慢。有没有办法加快这个过程并完成与这段代码 sn-p 执行的相同的结果？

fault_array =[] 
for i in x_range_original:
  for j in range(0,16):
    lower_threshold = min(df_records[:,j+1])
    upper_threshold = max(df_records[:,j+1])

    if((df_log[i,j] < lower_threshold) or (df_log[i,j] > upper_threshold)):
      print("Fault detected at timestep: ",df_records['Time'][i])
      fault_array.append(1)
    else:
      print("Normal operation at timestep: ",df_records['Time'][i])

      fault_array.append(0)

【问题讨论】：

很多事情都不清楚，例如df_log，df_records。请。编辑您的问题并添加更多信息。
@Samha' 我认为不需要澄清变量，对吧？我的问题与检查 for 循环中的值有关，这需要相当长的时间。如果这可以以更快的方式执行（比如列表推导，不确定它是否在这里有效），那可能是合适的解决方案。
我无法理解循环体的实际处理工作量。理解代码绝对有帮助。否则，您应该将代码减少到更精简的示例，并消除琐碎。
尝试将 x_range_original 转换为函数生成器或迭代器以避免消耗内存wiki.python.org/moin/Generators
您在这里使用df_ 作为变量。这些数据框来自 Pandas 吗？如果没有，您可以使用 numpy 或 pandas 吗？因为你可以在 Pandas/numpy 中更轻松地表达这些想法，而且它比你用 Python 编写的任何东西都要快。

标签： python list for-loop if-statement time-complexity

【解决方案1】：

迷你代码审查：

fault_array =[] 
for i in x_range_original:
  for j in range(0,16):
    # recomputed on every i; perhaps you wanted j to be an outer loop
    # use vectorized versions of min and max
    lower_threshold = min(df_log[:,j])
    upper_threshold = max(df_log[:,j])

    # this condition is never true:
    # df_log[i,j] cannot be less than min(df_log[:,j])
    # same about upper threshold
    if((df_log[i,j] < lower_threshold) or (df_log[i,j] > upper_threshold)):
      print("Fault detected at timestep: ",df_records['Time'][i])
      fault_array.append(1)
    else:
      # perhaps you need to use a vectorized operation here instead of for loop:
      # fault_array = df.apply(lambda row: ...)
      print("Normal operation at timestep: ",df_records['Time'][i])
      fault_array.append(0)

除了总是消极的条件，我想你正在寻找类似的东西：

columns = list(range(16))
# I guess the thresholds logic should be different
upper_thresholds = df[columns].max(axis=0)
lower_thresholds = df[columns].min(axis=0)
# faults is a series of bools
faults = df[columns].apply(lambda row: any(row < lower_thresholds) or any(row > upper_thresholds), axis=1)
normal_timesteps = df_records.loc[faults, 'Time']

【讨论】：

感谢您的回答。抱歉，我在问题中有错字，阈值是使用 df_records 的 min-max 计算的，而不是 df_log （更新了问题）。根据您的回答，现在是否会保持相同的逻辑？谢谢