SQL-在每个用户第 n 次发生事件后返回行答案

【问题标题】：SQL- Return rows after nth occurrence of event per userSQL-在每个用户第 n 次发生事件后返回行
【发布时间】：2020-04-02 21:40:04
【问题描述】：

我正在使用 postgreSQL 8.0，并且我有一个包含 user_id、timestamp 和 event_id 的表。

如何在每个用户第 4 次出现 event_id = someID 后返回行（或行）？

|---------------------|--------------------|------------------|
|      user_id        |     timestamp      |     event_id     |
|---------------------|--------------------|------------------|
|          1          |  2020-04-02 12:00  |        11        |
|---------------------|--------------------|------------------|
|          2          |  2020-04-02 13:00  |        11        |
|---------------------|--------------------|------------------|
|          2          |  2020-04-02 14:00  |        99        |
|---------------------|--------------------|------------------|
|          2          |  2020-04-02 15:00  |        11        |
|---------------------|--------------------|------------------|
|          2          |  2020-04-02 16:00  |        11        |
|---------------------|--------------------|------------------|
|          2          |  2020-04-02 17:00  |        11        |
|---------------------|--------------------|------------------|
|          2          |  2020-04-02 17:00  |        11        |
|---------------------|--------------------|------------------|

即如果 event_id = 11，我只想要上表中的最后一行。

【问题讨论】：

Postgres 8.0 已近 10 年无人维护。你为什么用这么旧的版本？或者您是否可能使用基于该古老版本的一些叉子？ select version(); 带给你什么？

标签： sql postgresql greatest-n-per-group postgresql-8.0

【解决方案1】：

你可以使用窗口函数：

select *
from (
    select t.*, row_number() over(partition by user_id, event_id order by timestamp) rn
    from mytable t
) t
where rn > 4

这是一个从结果中删除行号的小技巧：

select (t).*
from (
    select t, row_number() over(partition by user_id, event_id order by timestamp) rn
    from mytable t
) x
where rn > 4

【讨论】：

我不确定问题出在哪里，但这似乎并没有返回正确的结果。它似乎错过了很多行
@Indy：据我所知，此查询确实适用于您的示例数据（即它只返回“最后”行）。你的真实数据有什么不同吗？
我可以深入研究数据的细节，但我需要在 particular event_id 出现 4 次后的所有行。上面的代码甚至没有将其作为输入（即在 event_id=11 出现 4 次后的所有行）

【解决方案2】：

您可以使用累积计数。此版本包括第 4 次出现：

select t.*
from (select t.*,
             count(*) filter (where event_id = 11) over (partition by user_id order by timestamp) as event_11_cnt
      from t
     ) t
where event_11_cnt >= 4;

filter 长期以来一直是有效的 Postgres 语法，但您可以使用：

select t.*
from (select t.*,
             sum( (event_id = 11)::int ) over (partition by user_id order by timestamp) as event_11_cnt
      from t
     ) t
where event_11_cnt >= 4;

这个版本没有：

where event_11_cnt > 4 or (event_11_cnt = 4 and event_id <> 11)

另一种方法：

select t.*
from t
where t.timestamp > (select t2.timestamp
                     from t t2
                     where t2.user_id = t.user_id and
                           t2.event_id = 11
                     order by t2.timestamp
                     limit 1 offset 3
                    );

【讨论】：

“过滤器（where”行对我造成了语法错误（我自己也不太熟悉这种语法）。替代方法抛出“错误：这种类型的相关子查询模式尚不支持”。也许您使用的是更新版本的 postgres？
@Indy 。 . .您必须使用过时的 Postgres 版本。 filter 已经支持一段时间了。

【解决方案3】：

很抱歉询问这么旧版本的 Postgres，这是一个有效的答案：

WITH EventOrdered AS(
  SELECT 
    EventTypeId
    , UserId
    , Timestamp
    , ROW_NUMBER() OVER (PARTITION BY EventTypeId, UserId ORDER BY Timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) ROW_NO
  FROM Event),
FourthEvent AS (
  SELECT DISTINCT
    UserID
  , FIRST_VALUE(TimeStamp) OVER (PARTITION BY UserId ORDER BY Timestamp) FirstFourthEventTimestamp
  FROM EventOrdered
  WHERE ROW_NO = 4)
SELECT e.*
FROM Event e
JOIN FourthEvent ffe
  ON e.UserId = ffe.UserId
  AND e.Timestamp > ffe.FirstFourthEventTimestamp
ORDER BY e.UserId, e.Timestamp

【讨论】：

这绝对不会在 Postgres 8.0 上工作，该版本既没有 CTE 也没有窗口函数。 select version(); 究竟向你展示了什么？