【问题标题】:Optimizing windowing query in presto在 presto 中优化窗口查询
【发布时间】:2021-05-26 11:46:18
【问题描述】:

我有一个表,其中包含 user_id、col1、col2、col3、updated_at、is_deleted、day 等字段。

当前查询看起来像这样 -

 SELECT DISTINCT
    user_id,
    first_value(col1) ignore nulls OVER (partition BY user_id 
 ORDER BY
    updated_at DESC rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED following) AS col1,
    first_value(col2) ignore nulls OVER (partition BY user_id 
 ORDER BY
    updated_at DESC rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED following) AS col2,
    first_value(col3) ignore nulls OVER (partition BY user_id 
 ORDER BY
    updated_at DESC rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED following) AS col3,
    bool_or(is_deleted) ignore nulls OVER (partition BY user_id 
 ORDER BY
    updated_at DESC rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED following) AS is_deleted 
 FROM
    my_table
 WHERE
    day >= '2021-05-25'

基本上,我想要每个用户 ID 的每列的最新(第一个)值。由于每个值列都可以为空,因此我必须多次运行相同的窗口查询(对于每一列)。 目前,66% 的时间都花在了窗口化上。 有什么办法优化吗?

【问题讨论】:

    标签: sql bigdata presto


    【解决方案1】:

    好像你想要这个:

    select * from (
      select * , row_number() over (partition by user_id ORDER BY updated_at DESC) rn 
      from my_table
      where day >= '2021-05-25'
    ) t 
    where rn = 1
    

    【讨论】:

    • 它可以返回 OP 想要避免的 NULL。
    猜你喜欢
    • 1970-01-01
    • 2011-12-09
    • 2015-11-22
    • 1970-01-01
    • 1970-01-01
    • 2019-02-25
    • 2018-01-06
    • 2020-04-02
    • 2020-09-21
    相关资源
    最近更新 更多