【问题标题】:How to apply windows for each specific column in SQL? [duplicate]如何为 SQL 中的每个特定列应用窗口? [复制]
【发布时间】:2023-07-31 19:05:01
【问题描述】:

我想为每个特定的 unit_of_measure 每个用户完成最后一个事件:

我有这张桌子:

person_id   event_time       event_derscription   unit_of_measure 
-----------------------------------------------------------------
1           20200801120101  "some description"     "unit1"
1           20200801120501  "some description 2"   "unit1"
1           20200801120501  "some description 2"   "unit9"
2           20200801120301  "some description 3"   "unit1"
2           20200801120501  "some description 4"   "unit1"

预期输出是:

person_id   event_time       event_derscription   unit_of_measure 
-----------------------------------------------------------------
1           20200801120101  "some description"     "unit1"
2           20200801120301  "some description 2"   "unit1"
1           20200801120501  "some description 2"   "unit9"

我尝试了什么:

select * 
from 
    (select 
         person_id, event_time, event_derscription, unit_of_measure, 
         rank() over (partition by unit_of_measure order by event_time desc) as RN 
     from 
         test.person_events 
     where 
         partition_name = 20200801 
     group by 
         person_id, event_time, event_description, unit_of_measure) 
where 
    RN = 1;  // I try to use group by person_id to get the result for each person_id but it did not work 

我上面代码的输出是:

person_id   event_time       event_derscription   unit_of_measure 
-----------------------------------------------------------------
2           20200801120301  "some description 2"   "unit1"
1           20200801120501  "some description 2"   "unit9"

我做错了什么吗?

【问题讨论】:

    标签: sql oracle greatest-n-per-group presto trino


    【解决方案1】:

    我认为你想要的查询是:

    select person_id, event_time, event_derscription, unit_of_measure
    from (select pe,
                 row_number() over (partition BY unit_of_measure, person_id order by event_time desc) as seqnum
          from test.person_events pe
          where partition_name = 20200801 
         ) pe
    where seqnum = 1; 
    

    注意事项:

    • 解决问题的主要方法是将person_id 包含在partition by 中。
    • 我认为不需要group by。您的问题中没有任何内容提到它为什么是可取的。
    • 要获取一行,请使用row_number() 而不是rank()。即使你没有重复,它也传达了你想要一排的意图。

    【讨论】:

    • @code 。 . .我现在意识到您故意使用rank() 来获取不同的person_ids,然后使用group by 将其删除。巧妙地尝试将各个部分组合在一起——但这是要走的路。