【发布时间】:2020-09-16 22:16:13
【问题描述】:
我的搜索会话日志如下所示:
+----------+-------------------------+----------+
| dt | search_time | searches |
+----------+-------------------------+----------+
| 20200601 | 2020-06-01 00:36:38.000 | 1 |
| 20200601 | 2020-06-01 00:37:38.000 | 1 |
| 20200601 | 2020-06-01 00:39:18.000 | 1 |
| 20200601 | 2020-06-01 01:16:18.000 | 1 |
| 20200601 | 2020-06-01 03:56:38.000 | 1 |
| 20200601 | 2020-06-01 05:36:38.000 | 1 |
| 20200601 | 2020-06-01 05:37:38.000 | 1 |
| 20200601 | 2020-06-01 05:39:38.000 | 1 |
| 20200601 | 2020-06-01 05:41:38.000 | 1 |
| 20200601 | 2020-06-01 07:26:38.000 | 1 |
+----------+-------------------------+----------+
我的任务是将每一行划分为会话组。会话组最多五分钟。
例如:
前 3 个会话将形成一个组会话 1 - 如果我们累积每行之间的分钟数,我们将得到 3 分钟,而第 4 个会话将累积超过 5 分钟,因此它将是一个不同的会话组。
+----------+-------------------------+----------+---------------+
| dt | search_time | searches | group_session |
+----------+-------------------------+----------+---------------+
| 20200601 | 2020-06-01 00:36:38.000 | 1 | 1 |
| 20200601 | 2020-06-01 00:37:38.000 | 1 | 1 |
| 20200601 | 2020-06-01 00:39:18.000 | 1 | 1 |
| 20200601 | 2020-06-01 01:16:18.000 | 1 | 2 |
+----------+-------------------------+----------+---------------+
我像这样操作表以便为分区做好准备:
WITH [Sub Table] AS
(
SELECT [dt]
,[search_time]
,[pervious search time] = LAG(search_time) OVER (ORDER BY search_time)
,[min diff] = ISNULL(DATEDIFF(MINUTE,LAG(search_time) OVER (ORDER BY search_time),search_time),0)
,[searches]
FROM [search_session]
)
SELECT
[dt],
[search_time],
[pervious search time],
[min diff],
[searches]
FROM [Sub Table]
得到了这个:
+----------+-------------------------+-------------------------+----------+----------+
| dt | search_time | pervious search time | min diff | searches |
+----------+-------------------------+-------------------------+----------+----------+
| 20200601 | 2020-06-01 00:36:38.000 | NULL | 0 | 1 |
| 20200601 | 2020-06-01 00:37:38.000 | 2020-06-01 00:36:38.000 | 1 | 1 |
| 20200601 | 2020-06-01 00:39:18.000 | 2020-06-01 00:37:38.000 | 2 | 1 |
| 20200601 | 2020-06-01 01:16:18.000 | 2020-06-01 00:39:18.000 | 37 | 1 |
| 20200601 | 2020-06-01 03:56:38.000 | 2020-06-01 01:16:18.000 | 160 | 1 |
| 20200601 | 2020-06-01 05:36:38.000 | 2020-06-01 03:56:38.000 | 100 | 1 |
| 20200601 | 2020-06-01 05:37:38.000 | 2020-06-01 05:36:38.000 | 1 | 1 |
| 20200601 | 2020-06-01 05:39:38.000 | 2020-06-01 05:37:38.000 | 2 | 1 |
| 20200601 | 2020-06-01 05:41:38.000 | 2020-06-01 05:39:38.000 | 2 | 1 |
| 20200601 | 2020-06-01 07:26:38.000 | 2020-06-01 05:41:38.000 | 105 | 1 |
+----------+-------------------------+-------------------------+----------+----------+
我想到了两种继续的可能性:
-
使用窗口函数,如 RANK(),我可以对行进行分区,但我不知道如何使用条件来设置 PARTITION BY caluse。
-
使用 WHILE 循环迭代表 - 再次发现很难形成 ths
【问题讨论】:
标签: sql sql-server datetime window-functions recursive-query