【问题标题】:window function in redshift红移中的窗函数
【发布时间】:2016-02-12 19:53:05
【问题描述】:

我有一些看起来像这样的数据:

CustID  EventID     TimeStamp
1       17          1/1/15 13:23
1       17          1/1/15 14:32
1       13          1/1/25 14:54
1       13          1/3/15 1:34
1       17          1/5/15 2:54
1       1           1/5/15 3:00
2       17          2/5/15 9:12
2       17          2/5/15 9:18
2       1           2/5/15 10:02
2       13          2/8/15 7:43
2       13          2/8/15 7:50
2       1           2/8/15 8:00

我正在尝试使用 row_number 函数让它看起来像这样:

CustID  EventID     TimeStamp      SeqNum
1       17          1/1/15 13:23    1
1       17          1/1/15 14:32    1
1       13          1/1/25 14:54    2
1       13          1/3/15 1:34     2
1       17          1/5/15 2:54     3
1       1           1/5/15 3:00     4
2       17          2/5/15 9:12     1
2       17          2/5/15 9:18     1
2       1           2/5/15 10:02    2   
2       13          2/8/15 7:43     3
2       13          2/8/15 7:50     3
2       1           2/8/15 8:00     4

我试过这个:

row_number () over 
          (partition by custID, EventID
           order by custID, TimeStamp asc) SeqNum]

但得到了这个:

CustID  EventID     TimeStamp      SeqNum
1       17          1/1/15 13:23    1
1       17          1/1/15 14:32    2
1       13          1/1/25 14:54    3
1       13          1/3/15 1:34     4
1       17          1/5/15 2:54     5
1       1           1/5/15 3:00     6
2       17          2/5/15 9:12     1
2       17          2/5/15 9:18     2
2       1           2/5/15 10:02    3   
2       13          2/8/15 7:43     4
2       13          2/8/15 7:50     5
2       1           2/8/15 8:00     6

如何根据 EventID 的变化对其进行排序?

【问题讨论】:

    标签: sql amazon-redshift window-functions


    【解决方案1】:

    这很棘手。你需要一个多步骤的过程。您需要识别组(row_number() 的差异适用于此)。然后,为每个组分配一个递增的常数。 然后使用dense_rank():

    select sd.*, dense_rank() over (partition by custid order by mints) as seqnum
    from (select sd.*,
                 min(timestamp) over (partition by custid, eventid, grp) as mints
          from (select sd.*,
                       (row_number() over (partition by custid order by timestamp) -
                        row_number() over (partition by custid, eventid order by timestamp)
                       ) as grp
                from somedata sd
               ) sd
         ) sd;
    

    另一种方法是使用lag() 和累积和:

    select sd.*,
           sum(case when prev_eventid is null or prev_eventid <> eventid
                    then 1 else 0 end) over (partition by custid order by timestamp
                                            ) as seqnum
    from (select sd.*,
                 lag(eventid) over (partition by custid order by timestamp) as prev_eventid
          from somedata sd
         ) sd;
    

    编辑:

    我上次使用 Amazon Redshift 时没有 row_number()。你可以这样做:

    select sd.*, dense_rank() over (partition by custid order by mints) as seqnum
    from (select sd.*,
                 min(timestamp) over (partition by custid, eventid, grp) as mints
          from (select sd.*,
                       (row_number() over (partition by custid order by timestamp rows between unbounded preceding and current row) -
                        row_number() over (partition by custid, eventid order by timestamp rows between unbounded preceding and current row)
                       ) as grp
                from somedata sd
               ) sd
         ) sd;
    

    【讨论】:

    【解决方案2】:

    试试这个代码块:

    WITH by_day
    AS (SELECT
      *,
      ts::date AS login_day
    FROM table_name)
    SELECT
      *,
      login_day,
      FIRST_VALUE(login_day) OVER (PARTITION BY userid ORDER BY login_day , userid rows unbounded preceding) AS first_day
    FROM by_day
    

    【讨论】:

      猜你喜欢
      • 2022-01-20
      • 2022-09-24
      • 1970-01-01
      • 1970-01-01
      • 2021-04-24
      • 2016-08-20
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多