【发布时间】:2021-02-19 10:06:57
【问题描述】:
TL;DR:
鉴于此表:
WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, "premium" as product, 50 as diff
UNION ALL SELECT TIMESTAMP("2020-11-01"), "basic", 100
UNION ALL SELECT TIMESTAMP("2020-11-02"), "basic", -10
UNION ALL SELECT TIMESTAMP("2020-11-03"), "premium", 20
UNION ALL SELECT TIMESTAMP("2020-11-03"), "basic", 40
)
如何获得一个表格,其中包含缺少的日期/产品组合 (2020-11-02 - premium) 以及 diff 的后备值 0。
理想情况下,适用于多种产品。所有产品的列表可以这样得到:
SELECT ARRAY_AGG(DISTINCT product) FROM subscriptions
我希望能够获取所有产品或某些产品的每日订阅数。
我认为可以轻松实现这一点的方法是准备一个如下所示的数据库:
|---------------------|------------------|------------------|
| date | product | total |
|---------------------|------------------|------------------|
| 2020-11-01 | premium | 100 |
|---------------------|------------------|------------------|
| 2020-11-01 | basic | 50 |
|---------------------|------------------|------------------|
有了这张表,我可以很容易地按日期和产品分组,或者只按日期和总和。
在获得结果表之前,我已经生成了一个表,在该表中我计算了每天和产品的订阅差异。每个产品有多少新订阅者,有多少不再订阅。
此表如下所示:
|---------------------|------------------|------------------|
| date | product | diff |
|---------------------|------------------|------------------|
| 2020-11-01 | premium | 50 |
|---------------------|------------------|------------------|
| 2020-11-01 | basic | -20 |
|---------------------|------------------|------------------|
意味着11月1日高级用户总数增加了50个,基本用户总数减少了20个。
现在的问题是,如果一个产品没有任何更改,则此临时表缺少日期点,请参见下面的示例。
当我开始时没有产品表,我只有日期和差异列。
为了从第二个表到第一个表,我使用了这个完美的查询:
WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, 150 as diff
UNION ALL SELECT TIMESTAMP("2020-11-02"), -10
UNION ALL SELECT TIMESTAMP("2020-11-03"), 60
)
SELECT
*,
SUM(diff) OVER (ORDER BY date) as total_subscriptions
FROM subscriptions
ORDER BY date
但是当我添加产品列并尝试计算每天和产品的总和时,缺少一些数据点。
WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, "premium" as product, 50 as diff
UNION ALL SELECT TIMESTAMP("2020-11-01"), "basic", 100
UNION ALL SELECT TIMESTAMP("2020-11-02"), "basic", -10
UNION ALL SELECT TIMESTAMP("2020-11-03"), "premium", 20
UNION ALL SELECT TIMESTAMP("2020-11-03"), "basic", 40
)
SELECT
*,
SUM(diff) OVER (PARTITION BY product ORDER BY date) as total_subscriptions
FROM subscriptions
ORDER BY date
--
|---------------------|------------------|------------------|
| date | product | total |
|---------------------|------------------|------------------|
| 2020-11-01 | basic | 100 |
|---------------------|------------------|------------------|
| 2020-11-01 | premium | 50 |
|---------------------|------------------|------------------|
| 2020-11-02 | basic | 90 |
|---------------------|------------------|------------------|
| 2020-11-03 | basic | 130 |
|---------------------|------------------|------------------|
| 2020-11-03 | premium | 70 |
|---------------------|------------------|------------------|
如果我现在显示每天的订阅总数,我会得到:
150 -> 90 -> 200
但我希望:
150 -> 140 -> 200
每天的高级订阅总数也是如此:
50 -> 0 -> 70
但我希望:
50 -> 50 -> 70
我认为解决此问题的最佳选择是添加缺少的日期/产品组合。
我该怎么做?
【问题讨论】:
-
请编辑您的问题并显示您想要的结果。
-
预期输出 - 请澄清!
标签: sql datetime google-bigquery sum recursive-query