【发布时间】:2020-12-10 18:45:19
【问题描述】:
我的目标是显着加快我的代码速度,我认为可以使用 np.select 来完成,尽管我不知道如何。
这是我的代码执行时的当前输出:
date starting_temp average_high average_low limit_temp observation_date Date_Limit_reached
2019-12-03 22:30:00 NaN 13.0 14.8 NaN nan
2019-12-03 23:00:00 NaN 14.7 14.9 NaN nan
2019-12-03 23:30:00 NaN 13.0 13.9 NaN nan
2019-12-04 00:00:00 13.2 13.0 14.7 NaN 2019-12-04 10:00:00
2019-12-04 00:30:00 NaN 14.0 13.8 NaN nan
2019-12-04 01:00:00 NaN 13.9 13.8 NaN nan
2019-12-04 01:30:00 NaN 13.6 14.8 NaN nan
2019-12-04 02:00:00 NaN 13.1 14.5 NaN nan
2019-12-04 02:30:00 NaN 14.9 13.7 NaN nan
2019-12-04 03:00:00 NaN 14.2 14.1 NaN nan
2019-12-04 03:30:00 NaN 13.4 14.1 NaN nan
2019-12-04 04:00:00 NaN 14.3 13.0 NaN nan
2019-12-04 04:30:00 NaN 13.5 14.1 NaN nan
2019-12-04 05:00:00 NaN 13.6 13.4 NaN nan
2019-12-04 05:30:00 NaN 14.5 13.9 NaN nan
2019-12-04 06:00:00 NaN 14.4 14.5 NaN nan
2019-12-04 06:30:00 NaN 13.7 14.2 NaN nan
2019-12-04 07:00:00 NaN 13.7 14.2 NaN nan
2019-12-04 07:30:00 NaN 13.2 14.4 NaN nan
2019-12-04 08:00:00 NaN 13.9 13.1 NaN nan
2019-12-04 08:30:00 NaN 13.9 14.4 NaN nan
2019-12-04 09:00:00 NaN 14.4 13.9 NaN nan
2019-12-04 09:30:00 NaN 14.4 13.8 NaN nan
2019-12-04 10:00:00 NaN 15.0 14.0 NaN nan
2019-12-04 10:30:00 NaN 13.2 13.2 NaN nan
2019-12-04 11:00:00 NaN 14.0 13.3 NaN nan
2019-12-04 11:30:00 NaN 14.2 13.4 NaN nan
2019-12-04 12:00:00 NaN 14.2 13.4 NaN nan
2019-12-04 12:30:00 NaN 13.7 13.6 NaN nan
2019-12-04 13:00:00 NaN 14.1 13.3 NaN nan
2019-12-04 13:30:00 NaN 13.1 14.1 NaN nan
2019-12-04 14:00:00 NaN 13.2 14.3 NaN nan
2019-12-04 14:30:00 NaN 13.7 13.8 NaN nan
生成最终 df['Date_Limit_reached'] 列的代码太慢了,我在下面添加了。如果可能,我想将其结构更改为np.select:
new_col = []
df_size = len(df)
# Loop the dataframe
for ind in df.index:
if not math.isnan(df['starting_temp'][ind]):
entry_price_val = df['starting_temp'][ind]
count = 0
hasValue = False
while count < df_size:
if df['starting_temp'][ind] > df['limit_temp'][ind] and df['limit_temp'][ind] >= df['asklow'][count] and df['date'][count] >= df['observation_date'][ind] :
new_col.append(df['date'][count])
hasValue = True
break # Break the loop if matching value meets
count += 1
elif df['starting_temp'][ind] < df['limit_temp'][ind] and df['limit_temp'][ind] <= df['average_high'][count] and df['date'][count] >= df['observation_date'][ind] :
new_col.append(df['date'][count])
hasValue = True
break # Break the loop if matching value meets
count += 1
# If matching value not meets, then append nan value to the column
if not hasValue:
new_col.append(float('nan'))
else:
new_col.append(float('nan'))
df['Date_Limit_reached'] = new_col
【问题讨论】:
-
什么是
df?代码中没有定义。 -
您能否提供一个代码来创建示例数据框
df,以便我们可以轻松地就列类型达成一致、测试您的代码并讨论类似数据的性能?
标签: python performance numpy for-loop