【发布时间】:2020-12-19 13:05:52
【问题描述】:
我有一个这样的数据框:
import pandas as pd
df = pd.DataFrame({'ID': [1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3],
'val': [1,2,3,1,2,3,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3],
'time': [pd.Timestamp(2017, 1, 1, 12), pd.Timestamp(2017, 1, 1, 13), pd.Timestamp(2017, 1, 1, 14), pd.Timestamp(2017, 1, 2, 16), pd.Timestamp(2017, 1, 2, 17), pd.Timestamp(2017, 1, 2, 18), pd.Timestamp(2017, 1, 1, 12), pd.Timestamp(2017, 1, 1, 13), pd.Timestamp(2017, 1, 1, 14), pd.Timestamp(2017, 1, 1, 15), pd.Timestamp(2017, 1, 1, 16), pd.Timestamp(2017, 1, 2, 15), pd.Timestamp(2017, 1, 1, 12), pd.Timestamp(2017, 1, 1, 13), pd.Timestamp(2017, 1, 1, 14), pd.Timestamp(2017, 1, 1, 15), pd.Timestamp(2017, 1, 1, 16), pd.Timestamp(2017, 1, 1, 17), pd.Timestamp(2017, 1, 2, 18), pd.Timestamp(2017, 1, 2, 19), pd.Timestamp(2017, 1, 2, 20)]})
我想为每一行创建一个新列,在该行的time 之前的 24 小时窗口内,为具有相同ID 的所有行提供val 的平均值。
我怎样才能以 Python 的方式做到这一点?而不是遍历每一行。
预期输出:
ID val time 24hr_avg
0 1 1 2017-01-01 12:00:00 1.0 ###
1 1 2 2017-01-01 13:00:00 1.5 ##
2 1 3 2017-01-01 14:00:00 2.0 #
3 1 1 2017-01-02 16:00:00 1.0 ##
4 1 2 2017-01-02 17:00:00 1.5 ##
5 1 3 2017-01-02 18:00:00 2.0 #
6 2 1 2017-01-01 12:00:00 1.0 #####
7 2 2 2017-01-01 13:00:00 1.5 ####
8 2 3 2017-01-01 14:00:00 2.0 ###
9 2 4 2017-01-01 15:00:00 2.5 ###
10 2 5 2017-01-01 16:00:00 3.0 ##
11 2 6 2017-01-02 15:00:00 8.0 #
12 3 1 2017-01-01 12:00:00 1.0 ######
13 3 2 2017-01-01 13:00:00 1.5 #####
14 3 3 2017-01-01 14:00:00 2.0 ####
15 3 4 2017-01-01 15:00:00 2.5 ###
16 3 5 2017-01-01 16:00:00 3.0 ##
17 3 6 2017-01-01 17:00:00 3.5 #
18 3 1 2017-01-02 18:00:00 1.0 ###
19 3 2 2017-01-02 19:00:00 1.5 ##
20 3 3 2017-01-02 20:00:00 2.0 #
【问题讨论】:
标签: python pandas datetime rolling-computation