关于 group by 和 have 的优化 sql答案

【问题标题】：Optimization sql about group by and having关于 group by 和 have 的优化 sql
【发布时间】：2021-04-01 12:31:47
【问题描述】：

我有一个关于如何高效查询的常见问题。

id	time
d048533c-92d2-11eb-8dbb-fa163e962e00	1617272028623
6b5b455e-92d3-11eb-8dbb-fa163e962e00	1617272279382
024d0a5e-92d3-11eb-8dbb-fa163e962e00	1617272106615

我们有一张像上面这样的桌子。我们要根据以下条件过滤掉 ID：

如果两个或多个 ID 在 3 分钟内有时间，我们称它们为两个以上的组。
我们有 10000 个 ID，我们希望找到超过 10 个的所有组。

这是我的答案：

SELECT B.ID FROM TEMP B,TEMP A
WHERE B.ID != A.ID
AND (B.TIME <= A.TIME + 180000 AND B.TIME >= A.TIME - 180000) GROUP BY B.ID HAVING COUNT(*) >= 9;

有没有更有效的方法？

【问题讨论】：

请解释一下您所说的“过滤掉 id ......超过 10 个 ID 对应于 3 分钟内的时间”是什么意思。您的数据没有名为id 的列和两个具有_id 后缀的列。
对不起，我已经修复了错误

标签： sql optimization group-by common-table-expression having

【解决方案1】：

如果想要3分钟内出现的第10个id，那么可以使用lag()：

select t.*
from (select t.*,
             lag(time, 9) over (order by time) as time_9
      from t
     ) t
where time < time_9 + 3 * 60 * 1000;

我不确定这是否正是您想要的。但关键思想是使用一般的窗口函数——尤其是lag()——而不是自连接。

性能应该好多了。

编辑：

如果您想查找属于在 3 分钟内至少有 10 行的组中的所有行，请找到第一个 -- 然后确定“第一个”这样的行是否在任何其他行的 9 行之内行：

with t9 as (
      select t.*,
             (case when time_9 < time + 3 * 60 * 1000 then 1 else 0 end) as group_start
      from (select t.*,
                   lead(time, 9) over (order by time) as time_9
            from t
           ) t
     )
select t9.*
from (select t9.*,
             sum(group_start) over (order by time rows between 9 preceding and current row) as in_group_flag
      from t9
     ) t9
where in_group_flag > 0

【讨论】：

这是一个很有启发性的答案。关键是我们需要一种方法来查找所有符合条件的id