【发布时间】:2022-01-10 02:09:33
【问题描述】:
我们有日志/时间序列数据,例如。
输入
Session_id user_id session_timestamp
S1 U1 2019-10-01 22:00:00
S1 U1 2019-10-01 22:00:01
S1 U1 2019-10-01 22:00:02
S1 U1 2019-10-01 22:00:03
S1 U2 2019-10-01 22:00:04
S1 U2 2019-10-01 22:00:05
S1 U2 2019-10-01 22:00:06
S1 U2 2019-10-01 22:00:07
S1 U3 2019-10-01 22:00:08
S1 U3 2019-10-01 22:00:09
S1 U3 2019-10-01 22:00:10
S1 U3 2019-10-01 22:00:11
S1 U3 2019-10-01 22:00:12
S1 U1 2019-10-01 22:00:13
S1 U1 2019-10-01 22:00:14
S1 U1 2019-10-01 22:00:15
S1 U1 2019-10-01 22:00:16
输出
Session_id user_id Session_start_time Session_end_time
S1 U1 2019-10-01 22:00:00 2019-10-01 22:00:03
S1 U2 2019-10-01 22:00:04 2019-10-01 22:00:07
S1 U3 2019-10-01 22:00:08 2019-10-01 22:00:12
S1 U1 2019-10-01 22:00:13 2019-10-01 22:00:16
解释
We have a heartbeat logged at every second.
First four row should be considered as on session (User U1).
last four row are also part of different session (User U1).
我尝试使用带有 lag /lead 的窗口函数,但我无法区分 u1 的第二个会话,任何版本的 sql 都适合我。
数据脚本
create table logs(
Session_id varchar(10),
user_id varchar(10),
session_timestamp date
)
insert into logs
select * from( select 'S1' as session_id, 'U1' as user_id , '2019-10-01 22:00:00' as session_timestamp union
select 'S1' as session_id, 'U1' as user_id , '2019-10-01 22:00:01' as session_timestamp union
select 'S1' as session_id, 'U1' as user_id , '2019-10-01 22:00:02' as session_timestamp union
select 'S1' as session_id, 'U1' as user_id , '2019-10-01 22:00:03' as session_timestamp union
select 'S1' as session_id, 'U2' as user_id , '2019-10-01 22:00:04' as session_timestamp union
select 'S1' as session_id, 'U2' as user_id , '2019-10-01 22:00:05' as session_timestamp union
select 'S1' as session_id, 'U2' as user_id , '2019-10-01 22:00:06' as session_timestamp union
select 'S1' as session_id, 'U2' as user_id , '2019-10-01 22:00:07' as session_timestamp union
select 'S1' as session_id, 'U3' as user_id , '2019-10-01 22:00:08' as session_timestamp union
select 'S1' as session_id, 'U3' as user_id , '2019-10-01 22:00:09' as session_timestamp union
select 'S1' as session_id, 'U3' as user_id , '2019-10-01 22:00:10' as session_timestamp union
select 'S1' as session_id, 'U3' as user_id , '2019-10-01 22:00:11' as session_timestamp union
select 'S1' as session_id, 'U3' as user_id , '2019-10-01 22:00:12' as session_timestamp union
select 'S1' as session_id, 'U1' as user_id , '2019-10-01 22:00:13' as session_timestamp union
select 'S1' as session_id, 'U1' as user_id , '2019-10-01 22:00:14' as session_timestamp union
select 'S1' as session_id, 'U1' as user_id , '2019-10-01 22:00:15' as session_timestamp union
select 'S1' as session_id, 'U1' as user_id , '2019-10-01 22:00:16' as session_timestamp ) a
【问题讨论】:
-
Please ignore asking silly question是什么意思?没有像愚蠢的问题这样的事情。您使用的是哪个 DBMS,MySQL!=Postgresql,什么版本?你试过什么? -
Ergest 我尝试使用窗口函数和滞后/领先,但我无法区分 U1 的第二个甚至任何版本的工作。
标签: mysql sql postgresql hiveql