【发布时间】:2021-04-10 03:33:09
【问题描述】:
以下 DF 表示从用户接收到的事件。用户id和事件时间戳:
id timestamp
0 1 2020-09-01 18:14:35
1 1 2020-09-01 18:14:39
2 1 2020-09-01 18:14:40
3 1 2020-09-01 02:09:22
4 1 2020-09-01 02:09:35
5 1 2020-09-01 02:09:53
6 1 2020-09-01 02:09:57
7 2 2020-09-01 18:14:35
8 2 2020-09-01 18:14:39
9 2 2020-09-01 18:14:40
10 2 2020-09-01 02:09:22
11 2 2020-09-01 02:09:35
12 2 2020-09-01 02:09:53
13 2 2020-09-01 02:09:57
我想获得平均扩展会话时间。会话定义为中断超过 5 分钟的事件序列。
我将会话分组如下:
df.groupby(['id', pd.Grouper(key="timestamp", freq='5min', origin='start')])
并且得到了正确的组:
id timestamp
3 1 2020-09-01 02:09:22
4 1 2020-09-01 02:09:35
5 1 2020-09-01 02:09:53
6 1 2020-09-01 02:09:57
id timestamp
0 1 2020-09-01 18:14:35
1 1 2020-09-01 18:14:39
2 1 2020-09-01 18:14:40
id timestamp
10 2 2020-09-01 02:09:22
11 2 2020-09-01 02:09:35
12 2 2020-09-01 02:09:53
13 2 2020-09-01 02:09:57
id timestamp
7 2 2020-09-01 18:14:35
8 2 2020-09-01 18:14:39
9 2 2020-09-01 18:14:40
现在我想计算任何给定行中每个用户的平均会话时间(以秒为单位),因此输出为:
id timestamp avg_session_time
0 1 2020-09-01 18:14:35 0 <-- first event
1 1 2020-09-01 18:14:39 4 <-- 2nd event after 4 seconds
2 1 2020-09-01 18:14:40 5 <-- 3rd event after 5 seconds
--- session end
3 1 2020-09-01 02:09:22 5 <-- first event of second session
4 1 2020-09-01 02:09:35 9 <-- 2nd event after 13 seconds (13 seconds in the 2nd session + 5 in first session divide by the number of sessions 2)
5 1 2020-09-01 02:09:53 18 <-- 3rd event after 31 seconds ((31 + 5) / 2 = 18)
6 1 2020-09-01 02:09:57 20 <-- 4th event after 35 seconds ((35 + 5) / 2 = 20)
---
7 2 2020-09-01 18:14:35 0
8 2 2020-09-01 18:14:39 4
9 2 2020-09-01 18:14:40 5
---
10 2 2020-09-01 02:09:22 5
11 2 2020-09-01 02:09:35 9
12 2 2020-09-01 02:09:53 18
13 2 2020-09-01 02:09:57 20
任何帮助都会很棒:)
【问题讨论】:
标签: pandas group-by mean timedelta