【发布时间】:2021-09-23 19:23:06
【问题描述】:
我正在尝试计算时间序列 pandas 数据框中特定列的滚动平均值、最大值、最小值和标准差。但我不断得到 NaN 的滞后值,我不知道如何解决它。我的 MWE 是:
import numpy as np
import pandas as pd
# original data
df = pd.DataFrame()
np.random.seed(0)
days = pd.date_range(start='2015-01-01', end='2015-05-01', freq='1D')
df = pd.DataFrame({'Date': days, 'col1': np.random.randn(len(days)), 'col2': 20+np.random.randn(len(days)), 'col3': 50+np.random.randn(len(days))})
df = df.set_index('Date')
print(df.head(10))
def add_lag(dfObj, window):
cols = ['col2', 'col3']
for col in cols:
rolled = dfObj[col].rolling(window)
lag_mean = rolled.mean().reset_index()#.astype(np.float16)
lag_max = rolled.max().reset_index()#.astype(np.float16)
lag_min = rolled.min().reset_index()#.astype(np.float16)
lag_std = rolled.std().reset_index()#.astype(np.float16)
dfObj[f'{col}_mean_lag{window}'] = lag_mean[col]
dfObj[f'{col}_max_lag{window}'] = lag_max[col]
dfObj[f'{col}_min_lag{window}'] = lag_min[col]
dfObj[f'{col}_std_lag{window}'] = lag_std[col]
# add lag feature for 1 day, 3 days
add_lag(df, window=1)
add_lag(df, window=3)
print(df.head(10))
print(df.tail(10))
【问题讨论】:
-
尝试添加值:
dfObj[f'{col}_mean_lag{window}'] = lag_mean[col].values -
谢谢,这行得通。但就像下面 FredMaster 的解决方案一样,
df['col2_std_lag1']和df['col3_std_lag1']列都填充有NaN。 -
在计算一个值的标准差时,不能有除零以外的值。 . .你将滚动窗口设置为 1:
.rolling(window=1)你期待什么
标签: python pandas dataframe datetime