将您的时间戳设置为索引并使用 rolling() function。您可以将窗口设置为时间偏移。例如:
# creating the dataset
import numpy as np
np.random.seed(113)
df = pd.DataFrame({'timestamp': pd.date_range('2021-01-01', '2021-01-02', freq='1S'),
'value': np.random.randint(1,10,86401)})
# This randomly drops rows to make the dataset fragmented
df = df.drop(np.random.choice(range(1,86401), 85000, replace=False))
df.head(10)
timestamp value
0 2021-01-01 00:00:00 6
58 2021-01-01 00:00:58 7
237 2021-01-01 00:03:57 9
390 2021-01-01 00:06:30 7
481 2021-01-01 00:08:01 8
575 2021-01-01 00:09:35 4
580 2021-01-01 00:09:40 9
735 2021-01-01 00:12:15 7
894 2021-01-01 00:14:54 2
927 2021-01-01 00:15:27 5
现在使用 5 分钟窗口的 rolling() 函数。
df['rolling_mean'] = df.set_index('timestamp') \
.rolling('5T', closed='left')['value'].mean().values
df.head(10)
timestamp value rolling_mean
0 2021-01-01 00:00:00 6 NaN
58 2021-01-01 00:00:58 7 6.000000
237 2021-01-01 00:03:57 9 6.500000
390 2021-01-01 00:06:30 7 9.000000
481 2021-01-01 00:08:01 8 8.000000
575 2021-01-01 00:09:35 4 7.500000
580 2021-01-01 00:09:40 9 6.333333
735 2021-01-01 00:12:15 7 7.000000
894 2021-01-01 00:14:54 2 7.000000
927 2021-01-01 00:15:27 5 4.500000
这应该是前 5 分钟内的平均值,不包括当前时间。