【发布时间】:2021-03-11 09:44:53
【问题描述】:
我有一个相当长的表,大小为 2.47GB,有 1.126 亿行。该表有 8 列,第一列是 TIMESTAMP 类型,另外 7 列是 FLOAT 类型。该表已分区(天)并由datetime 聚集。
表格如下所示:
datetime col1 col2 col3 col4 col5 col6 col7
2020-07-29 07:19:24.920 UTC null null null null null null 0.01
2020-07-29 16:41:53.094 UTC 1.3344 null null 0.6975 null null 0.01
2020-07-29 08:05:23.705 UTC null 1.698 null null null null 0.01
2020-07-29 18:12:21.396 UTC 1.33435 null null null null null 0.01
2020-07-29 19:49:26.073 UTC null 1.12345 null null 1.33435 null 0.01
2020-07-29 19:33:21.540 UTC null null null null 1.33377 null 0.01
2020-07-29 04:11:24.596 UTC 1.256 null null null 1.33694 null 0.01
2020-07-29 09:27:05.052 UTC null null null 0.6868 null 104.889 0.01
我要做的是用与该表相同的值填充其他表,但执行 null 值的前向填充,我正在阅读可以使用 LAST_VALUE 函数实现的文档。
我尝试执行的查询使用LAST_VALUE 和OVER 用于除最后一列之外的所有列。
INSERT project.dataset.table (datetime, col1,col2,col3,col4,col5,col6)
WITH current AS(
SELECT
datetime,
LAST_VALUE(col1 IGNORE NULLS) OVER (ORDER BY datetime) AS col1,
LAST_VALUE(col2 IGNORE NULLS) OVER (ORDER BY datetime) AS col2,
LAST_VALUE(col3 IGNORE NULLS) OVER (ORDER BY datetime) AS col3,
LAST_VALUE(col4 IGNORE NULLS) OVER (ORDER BY datetime) AS col4,
LAST_VALUE(col5 IGNORE NULLS) OVER (ORDER BY datetime) AS col5,
LAST_VALUE(col6 IGNORE NULLS) OVER (ORDER BY datetime) AS col6,
FROM
project.dataset.origin_table
ORDER BY
datetime)
select * from current
但是,运行此查询会返回此错误:
Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 100% of limit. Top memory consumer(s): sort operations used for analytic OVER() clauses: 97% other/unattributed: 3%
因此,对于阅读错误消息,很明显OVER 是问题所在,除非我遗漏了某些内容或我不理解某些内容。
我怎样才能做到这一点?期望的输出是这样的:
datetime col1 col2 col3 col4 col5 col6 col7
2020-07-29 07:19:24.920 UTC null null null null null null 0.01
2020-07-29 16:41:53.094 UTC 1.3344 null null 0.6975 null null 0.01
2020-07-29 08:05:23.705 UTC 1.3344 1.698 null 0.6975 null null 0.01
2020-07-29 18:12:21.396 UTC 1.33435 1.698 null 0.6975 null null 0.01
2020-07-29 19:49:26.073 UTC 1.33435 1.12345 null 0.6975 1.33435 null 0.01
2020-07-29 19:33:21.540 UTC 1.33435 1.12345 null 0.6975 1.33377 null 0.01
2020-07-29 04:11:24.596 UTC 1.256 1.12345 null 0.6975 1.33694 null 0.01
2020-07-29 09:27:05.052 UTC 1.256 1.12345 null 0.6868 1.33694 104.889 0.01
用每列中遇到的最后一个值填充空值。
谢谢!
【问题讨论】:
标签: sql google-cloud-platform google-bigquery