【发布时间】:2020-11-18 23:26:14
【问题描述】:
Pandas 有一个rolling() 函数可以在 Series 和 DataFrame 对象的窗口上执行计算。如果索引是日期时间(或者您使用 on 参数引用日期时间列),则可以在偏移量上执行 rolling(),例如 2 秒或 7 天。
我想计算每个窗口的实际持续时间,而不是偏移量。我能想到的最好的方法是复制时间戳列,为索引设置一个,然后使用rolling() 获取最小值和最大值。但是,在调用 rolling() 后,新的 Timestamp 列会被删除。
import pandas as pd
df = pd.DataFrame({'B': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
'Tm': [pd.Timestamp('20130101 09:00:00'),
pd.Timestamp('20130101 09:00:02'),
pd.Timestamp('20130101 09:00:03'),
pd.Timestamp('20130101 09:00:05'),
pd.Timestamp('20130101 09:00:06'),
pd.Timestamp('20130101 09:00:10'),
pd.Timestamp('20130101 09:00:12'),
pd.Timestamp('20130101 09:00:16'),
pd.Timestamp('20130101 09:00:19'),
pd.Timestamp('20130101 09:00:20')]})
df['t'] = df['Tm']
print(df)
max_times = df.rolling('5s', on='Tm').max()
min_times = df.rolling('5s', on='Tm').min()
print(max_times)
print((max_times - min_times).astype('timedelta64[s]'))
输出:
B Tm t
0 0 2013-01-01 09:00:00 2013-01-01 09:00:00
1 1 2013-01-01 09:00:02 2013-01-01 09:00:02
2 2 2013-01-01 09:00:03 2013-01-01 09:00:03
3 3 2013-01-01 09:00:05 2013-01-01 09:00:05
4 4 2013-01-01 09:00:06 2013-01-01 09:00:06
5 5 2013-01-01 09:00:10 2013-01-01 09:00:10
6 6 2013-01-01 09:00:12 2013-01-01 09:00:12
7 7 2013-01-01 09:00:16 2013-01-01 09:00:16
8 8 2013-01-01 09:00:19 2013-01-01 09:00:19
9 9 2013-01-01 09:00:20 2013-01-01 09:00:20
B Tm
0 0.0 2013-01-01 09:00:00
1 1.0 2013-01-01 09:00:02
2 2.0 2013-01-01 09:00:03
3 3.0 2013-01-01 09:00:05
4 4.0 2013-01-01 09:00:06
5 5.0 2013-01-01 09:00:10
6 6.0 2013-01-01 09:00:12
7 7.0 2013-01-01 09:00:16
8 8.0 2013-01-01 09:00:19
9 9.0 2013-01-01 09:00:20
B Tm
0 00:00:00 0.0
1 00:00:01 0.0
2 00:00:02 0.0
3 00:00:02 0.0
4 00:00:03 0.0
5 00:00:01 0.0
6 00:00:01 0.0
7 00:00:01 0.0
8 00:00:01 0.0
9 00:00:02 0.0
肯定有更优雅(和实用)的技术吗?
【问题讨论】: