【发布时间】:2021-05-26 11:46:18
【问题描述】:
我有一个表,其中包含 user_id、col1、col2、col3、updated_at、is_deleted、day 等字段。
当前查询看起来像这样 -
SELECT DISTINCT
user_id,
first_value(col1) ignore nulls OVER (partition BY user_id
ORDER BY
updated_at DESC rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED following) AS col1,
first_value(col2) ignore nulls OVER (partition BY user_id
ORDER BY
updated_at DESC rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED following) AS col2,
first_value(col3) ignore nulls OVER (partition BY user_id
ORDER BY
updated_at DESC rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED following) AS col3,
bool_or(is_deleted) ignore nulls OVER (partition BY user_id
ORDER BY
updated_at DESC rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED following) AS is_deleted
FROM
my_table
WHERE
day >= '2021-05-25'
基本上,我想要每个用户 ID 的每列的最新(第一个)值。由于每个值列都可以为空,因此我必须多次运行相同的窗口查询(对于每一列)。 目前,66% 的时间都花在了窗口化上。 有什么办法优化吗?
【问题讨论】: