【问题标题】:BigQuery/SQL: Filling in gaps as duplicate rows between yearsBigQuery/SQL:填补空白作为年份之间的重复行
【发布时间】:2020-01-20 17:47:48
【问题描述】:

我有一个非常相似的场景,如该线程所示:Duplicate groups of records to fill multiple date gaps in Google BigQuery

我使用的查询如下:

WITH history AS (
  SELECT 2012 AS d, 'a' AS product, 'x' AS partner, 10 AS value UNION ALL
  SELECT 2010 AS d, 'b' AS product, 'x' AS partner, 15 AS value UNION ALL
  SELECT 2014 AS d, 'a' AS product, 'y' AS partner, 11 AS value UNION ALL
  SELECT 2012 AS d, 'b' AS product, 'y' AS partner, 16 AS value UNION ALL
  SELECT 2015 AS d, 'b' AS product, 'x' AS partner, 13 AS value UNION ALL
  SELECT 2017 AS d, 'a' AS product, 'y' AS partner, 15 AS value UNION ALL
  SELECT 2017 AS d, 'a' AS product, 'x' AS partner, 15 AS value 
),
daterange AS (
  SELECT EXTRACT(YEAR FROM fiscalYear) as date_in_range
  FROM UNNEST(
    GENERATE_DATE_ARRAY(DATE('2010-01-01'), CURRENT_DATE(), INTERVAL 1 YEAR)
    ) AS fiscalYear),
temp AS (
  SELECT d, product, partner, value, LEAD(d) OVER(PARTITION BY product, partner ORDER BY d) AS next_d
  FROM history
  ORDER BY product, partner, d
)
SELECT date_in_range, product, partner, value
FROM daterange
JOIN temp
ON daterange.date_in_range >= temp.d 
AND (daterange.date_in_range < temp.next_d OR temp.next_d IS NULL)
ORDER BY product, partner, date_in_range

我注意到的一件事情并不适合我的方案,那就是每个产品、合作伙伴组合的日期并不总是从 2010 年开始,这正是我想要的。

所以每个产品合作伙伴的输出应该有 date_in_range [2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019]

这个查询返回的输出是:

+---------------+---------+---------+-------+
| date_in_range | product | partner | value |
+---------------+---------+---------+-------+
| 2012          | a       | x       | 10    |
+---------------+---------+---------+-------+
| 2013          | a       | x       | 10    |
+---------------+---------+---------+-------+
| 2014          | a       | x       | 10    |
+---------------+---------+---------+-------+
| 2015          | a       | x       | 10    |
+---------------+---------+---------+-------+
| 2016          | a       | x       | 10    |
+---------------+---------+---------+-------+
| 2017          | a       | x       | 15    |
+---------------+---------+---------+-------+
| 2018          | a       | x       | 15    |
+---------------+---------+---------+-------+
| 2019          | a       | x       | 15    |
+---------------+---------+---------+-------+
| 2014          | a       | y       | 11    |
+---------------+---------+---------+-------+
| 2015          | a       | y       | 11    |
+---------------+---------+---------+-------+
| 2016          | a       | y       | 11    |
+---------------+---------+---------+-------+
| 2017          | a       | y       | 15    |
+---------------+---------+---------+-------+
| 2018          | a       | y       | 15    |
+---------------+---------+---------+-------+
| 2019          | a       | y       | 15    |
+---------------+---------+---------+-------+
| 2010          | b       | x       | 15    |
+---------------+---------+---------+-------+
| 2011          | b       | x       | 15    |
+---------------+---------+---------+-------+
| 2012          | b       | x       | 15    |
+---------------+---------+---------+-------+
| 2013          | b       | x       | 15    |
+---------------+---------+---------+-------+
| 2014          | b       | x       | 15    |
+---------------+---------+---------+-------+
| 2015          | b       | x       | 13    |
+---------------+---------+---------+-------+
| 2016          | b       | x       | 13    |
+---------------+---------+---------+-------+
| 2017          | b       | x       | 13    |
+---------------+---------+---------+-------+
| 2018          | b       | x       | 13    |
+---------------+---------+---------+-------+
| 2019          | b       | x       | 13    |
+---------------+---------+---------+-------+
| 2012          | b       | y       | 16    |
+---------------+---------+---------+-------+
| 2013          | b       | y       | 16    |
+---------------+---------+---------+-------+
| 2014          | b       | y       | 16    |
+---------------+---------+---------+-------+
| 2015          | b       | y       | 16    |
+---------------+---------+---------+-------+
| 2016          | b       | y       | 16    |
+---------------+---------+---------+-------+
| 2017          | b       | y       | 16    |
+---------------+---------+---------+-------+
| 2018          | b       | y       | 16    |
+---------------+---------+---------+-------+
| 2019          | b       | y       | 16    |
+---------------+---------+---------+-------+

想要的输出是:

+---------------+---------+---------+-------+
| date_in_range | product | partner | value |
+---------------+---------+---------+-------+
| 2010          | a       | x       | 10    |
+---------------+---------+---------+-------+
| 2011          | a       | x       | 10    |
+---------------+---------+---------+-------+
| 2012          | a       | x       | 10    |
+---------------+---------+---------+-------+
| 2013          | a       | x       | 10    |
+---------------+---------+---------+-------+
| 2014          | a       | x       | 10    |
+---------------+---------+---------+-------+
| 2015          | a       | x       | 10    |
+---------------+---------+---------+-------+
| 2016          | a       | x       | 10    |
+---------------+---------+---------+-------+
| 2017          | a       | x       | 15    |
+---------------+---------+---------+-------+
| 2018          | a       | x       | 15    |
+---------------+---------+---------+-------+
| 2019          | a       | x       | 15    |
+---------------+---------+---------+-------+
| 2010          | a       | y       | 11    |
+---------------+---------+---------+-------+
| 2011          | a       | y       | 11    |
+---------------+---------+---------+-------+
| 2012          | a       | y       | 11    |
+---------------+---------+---------+-------+
| 2013          | a       | y       | 11    |
+---------------+---------+---------+-------+
| 2014          | a       | y       | 11    |
+---------------+---------+---------+-------+
| 2015          | a       | y       | 11    |
+---------------+---------+---------+-------+
| 2016          | a       | y       | 11    |
+---------------+---------+---------+-------+
| 2017          | a       | y       | 15    |
+---------------+---------+---------+-------+
| 2018          | a       | y       | 15    |
+---------------+---------+---------+-------+
| 2019          | a       | y       | 15    |
+---------------+---------+---------+-------+
| 2010          | b       | x       | 15    |
+---------------+---------+---------+-------+
| 2011          | b       | x       | 15    |
+---------------+---------+---------+-------+
| 2012          | b       | x       | 15    |
+---------------+---------+---------+-------+
| 2013          | b       | x       | 15    |
+---------------+---------+---------+-------+
| 2014          | b       | x       | 15    |
+---------------+---------+---------+-------+
| 2015          | b       | x       | 13    |
+---------------+---------+---------+-------+
| 2016          | b       | x       | 13    |
+---------------+---------+---------+-------+
| 2017          | b       | x       | 13    |
+---------------+---------+---------+-------+
| 2018          | b       | x       | 13    |
+---------------+---------+---------+-------+
| 2019          | b       | x       | 13    |
+---------------+---------+---------+-------+
| 2010          | b       | y       | 16    |
+---------------+---------+---------+-------+
| 2011          | b       | y       | 16    |
+---------------+---------+---------+-------+
| 2012          | b       | y       | 16    |
+---------------+---------+---------+-------+
| 2013          | b       | y       | 16    |
+---------------+---------+---------+-------+
| 2014          | b       | y       | 16    |
+---------------+---------+---------+-------+
| 2015          | b       | y       | 16    |
+---------------+---------+---------+-------+
| 2016          | b       | y       | 16    |
+---------------+---------+---------+-------+
| 2017          | b       | y       | 16    |
+---------------+---------+---------+-------+
| 2018          | b       | y       | 16    |
+---------------+---------+---------+-------+
| 2019          | b       | y       | 16    |
+---------------+---------+---------+-------+

【问题讨论】:

  • 样本数据和期望的结果真的很有帮助。我发现很难理解你的问题到底是什么。
  • 样本数据嵌入到查询中

标签: sql date google-bigquery


【解决方案1】:

以下是 BigQuery 标准 SQL

#standardSQL
WITH history AS (
  SELECT 2012 AS d, 'a' AS product, 'x' AS partner, 10 AS value UNION ALL
  SELECT 2010 AS d, 'b' AS product, 'x' AS partner, 15 AS value UNION ALL
  SELECT 2014 AS d, 'a' AS product, 'y' AS partner, 11 AS value UNION ALL
  SELECT 2012 AS d, 'b' AS product, 'y' AS partner, 16 AS value UNION ALL
  SELECT 2015 AS d, 'b' AS product, 'x' AS partner, 13 AS value UNION ALL
  SELECT 2017 AS d, 'a' AS product, 'y' AS partner, 15 AS value UNION ALL
  SELECT 2017 AS d, 'a' AS product, 'x' AS partner, 15 AS value 
),
daterange AS (
  SELECT EXTRACT(YEAR FROM fiscalYear) AS date_in_range
  FROM UNNEST(
    GENERATE_DATE_ARRAY(DATE('2010-01-01'), CURRENT_DATE(), INTERVAL 1 YEAR)
    ) AS fiscalYear),
history_ext AS (
  SELECT date_in_range, x.product, x.partner, value
  FROM daterange dr
  CROSS JOIN (SELECT DISTINCT product, partner FROM history) x
  LEFT JOIN history h
  ON dr.date_in_range = h.d
  AND STRUCT(h.product, h.partner) = STRUCT(x.product, x.partner)
)
SELECT date_in_range, product, partner, 
COALESCE(
  value, 
  LAST_VALUE(value IGNORE NULLS) OVER(PARTITION BY product, partner ORDER BY date_in_range ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
  FIRST_VALUE(value IGNORE NULLS) OVER(PARTITION BY product, partner ORDER BY date_in_range ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) 
) AS value
FROM history_ext
ORDER BY product, partner, date_in_range   

然后返回

Row date_in_range   product partner value    
1   2010    a   x   10   
2   2011    a   x   10   
3   2012    a   x   10   
4   2013    a   x   10   
5   2014    a   x   10   
6   2015    a   x   10   
7   2016    a   x   10   
8   2017    a   x   15   
9   2018    a   x   15   
10  2019    a   x   15   
11  2010    a   y   11   
12  2011    a   y   11   
13  2012    a   y   11   
14  2013    a   y   11   
15  2014    a   y   11   
16  2015    a   y   11   
17  2016    a   y   11   
18  2017    a   y   15   
19  2018    a   y   15   
20  2019    a   y   15   
21  2010    b   x   15   
22  2011    b   x   15   
23  2012    b   x   15   
24  2013    b   x   15   
25  2014    b   x   15   
26  2015    b   x   13   
27  2016    b   x   13   
28  2017    b   x   13   
29  2018    b   x   13   
30  2019    b   x   13   
31  2010    b   y   16   
32  2011    b   y   16   
33  2012    b   y   16   
34  2013    b   y   16   
35  2014    b   y   16   
36  2015    b   y   16   
37  2016    b   y   16   
38  2017    b   y   16   
39  2018    b   y   16   
40  2019    b   y   16   

【讨论】:

  • 非常感谢米哈伊尔