【问题标题】:Google BigQuery determine historical "active" status by date based on start/end datesGoogle BigQuery 根据开始/结束日期按日期确定历史“活动”状态
【发布时间】:2021-08-27 05:22:12
【问题描述】:

基本上,我有一个包含订阅数据的大表。

  • 每个订阅都有开始和结束日期。
  • 如果当前日期介于开始日期和结束日期(含)之间,则订阅被视为“有效”。

我的目标是获取某个日期范围内每天有多少订阅活动的历史计数。

我有以下查询可以完成我正在尝试做的事情。我只是想知道是否有比在两个数据集中创建一个用相同整数填充的假 ID 列并使用它来连接更优雅的方法。

WITH dummy AS
(
    SELECT DATE('2021-08-17') AS start_dt, DATE('2021-08-19') AS end_dt
    UNION ALL
    SELECT DATE('2021-08-18') AS start_dt, DATE('2021-08-20') AS end_dt
    UNION ALL
    SELECT DATE('2021-08-19') AS start_dt, DATE('2021-08-21') AS end_dt
)
SELECT  a.cur_date,
        start_dt,
        end_dt,
        IF(cur_date BETWEEN start_dt AND end_dt, 1, 0) AS active
FROM    (
            SELECT  0 AS id,
                    d AS cur_date
            FROM  ( SELECT GENERATE_DATE_ARRAY('2021-08-16', '2021-08-22', INTERVAL 1 DAY) AS dates ),
            UNNEST(dates) d
        ) a
    JOIN    (
                SELECT  0 as id,
                        *
                FROM dummy
            ) d
        ON a.id = d.id

然后我可以通过按 cur_date 和 SUM(active) 分组来确定我范围内每个日期的活动记录数。即

SELECT cur_date, SUM(active) AS count
FROM x 
GROUP BY cur_date

第一次查询结果:

cur_date start_dt end_dt active
2021-08-16 2021-08-17 2021-08-19 0
2021-08-16 2021-08-18 2021-08-20 0
2021-08-16 2021-08-19 2021-08-21 0
2021-08-17 2021-08-17 2021-08-19 1
2021-08-17 2021-08-18 2021-08-20 0
2021-08-17 2021-08-19 2021-08-21 0
2021-08-18 2021-08-17 2021-08-19 1
2021-08-18 2021-08-18 2021-08-20 1
2021-08-18 2021-08-19 2021-08-21 0
2021-08-19 2021-08-17 2021-08-19 1
2021-08-19 2021-08-18 2021-08-20 1
2021-08-19 2021-08-19 2021-08-21 1
2021-08-20 2021-08-17 2021-08-19 0
2021-08-20 2021-08-18 2021-08-20 1
2021-08-20 2021-08-19 2021-08-21 1
2021-08-21 2021-08-17 2021-08-19 0
2021-08-21 2021-08-18 2021-08-20 0
2021-08-21 2021-08-19 2021-08-21 1
2021-08-22 2021-08-17 2021-08-19 0
2021-08-22 2021-08-18 2021-08-20 0
2021-08-22 2021-08-19 2021-08-21 0

第二次查询结果:

cur_date count
2021-08-16 0
2021-08-17 1
2021-08-18 2
2021-08-19 3
2021-08-20 2
2021-08-21 1
2021-08-22 0

【问题讨论】:

  • 如果你问了一个关于你想要做什么的问题,那么它会更容易回答。样本数据很好,但不清楚您的数据是什么样子以及您要实际完成什么。
  • 嗨,戈登,编辑了我的问题以阐明我要实现的目标

标签: sql google-bigquery


【解决方案1】:

如果您想按天获取一组记录的活动计数,只需展开每条记录并聚合:

WITH dummy AS (
    SELECT DATE('2021-08-17') AS start_dt, DATE('2021-08-19') AS end_dt
    UNION ALL
    SELECT DATE('2021-08-18') AS start_dt, DATE('2021-08-20') AS end_dt
    UNION ALL
    SELECT DATE('2021-08-19') AS start_dt, DATE('2021-08-21') AS end_dt
   )
SELECT dt, COUNT(*)
FROM dummy d CROSS JOIN
     UNNEST(GENERATE_DATE_ARRAY(start_dt, end_dt, INTERVAL 1 DAY)) dt
GROUP BY dt
ORDER BY dt;

如果您想为某个范围生成具有 0 值的行,请使用 LEFT JOIN 以及此:

WITH dummy AS (
    SELECT DATE('2021-08-17') AS start_dt, DATE('2021-08-19') AS end_dt
    UNION ALL
    SELECT DATE('2021-08-18') AS start_dt, DATE('2021-08-20') AS end_dt
    UNION ALL
    SELECT DATE('2021-08-19') AS start_dt, DATE('2021-08-21') AS end_dt
   )
SELECT dt, COUNT(ddt)
FROM UNNEST(GENERATE_DATE_ARRAY(DATE '2021-08-16', DATE '2021-08-22', INTERVAL 1 DAY)) dt LEFT JOIN
     (dummy d CROSS JOIN
      UNNEST(GENERATE_DATE_ARRAY(start_dt, end_dt, INTERVAL 1 DAY)) ddt
     )
     ON dt = ddt
GROUP BY dt
ORDER BY dt

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2023-03-17
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-12-03
    • 1970-01-01
    相关资源
    最近更新 更多