【问题标题】:How can fill the missing dates in google BigQuery如何在 google BigQuery 中填写缺失的日期
【发布时间】:2019-06-22 15:35:23
【问题描述】:

我想写一个图表来显示 firebase 中的活跃用户

我写了这段代码

SELECT event_date, COUNT(DISTINCT user_pseudo_id) AS user_count
FROM `mark-3314e.analytics_197261162.events_*`  
WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)) AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
AND event_name = 'session_start'
GROUP BY event_date
ORDER BY event_date ASC

这是回应

Row event_date  user_count  
1   20190617        1
2   20190621        3

有没有办法用以前的数据填充 21 到 17 之间缺失的日期?喜欢:

event_date  user_count  
20190617        1
20190618        1
20190619        1
20190620        1
20190621        3

【问题讨论】:

    标签: firebase google-bigquery firebase-analytics


    【解决方案1】:

    您可以加入包含感兴趣的完整日期范围的日历表:

    WITH dates AS (
        SELECT '20190617' AS dt UNION ALL
        SELECT '20190618' UNION ALL
        SELECT '20190619' UNION ALL
        SELECT '20190620' UNION ALL
        SELECT '20190621'
    )
    
    SELECT
        t1.dt AS event_date,
        COUNT(DISTINCT t2.user_pseudo_id) AS user_count
    FROM dates t1
    LEFT JOIN `mark-3314e.analytics_197261162.events_*` t2
        ON t1.dt = t2.event_date AND
           t2._TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)) AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
           AND t2.event_name = 'session_start'
    GROUP BY
        t1.dt
    ORDER BY
        t1.dt;
    

    有关在 BigQuery 中生成日期范围的更通用方法,see this SO question

    【讨论】:

    • 我不得不将 GROUP BY t2.dt 中的 t2 更改为 t1 因为它不起作用...然后,它仍然显示相同结果(20190617 1)和(20190621 3)
    • @AymenFezai 尝试将 WHERE 子句中的所有逻辑移动到联接的 ON 子句中。我错了,应该从一开始就这样做。
    • 它用零填充缺失的数据,但它应该用以前的结果填充它们。我的意思是,像这样:(20190617 1)、(20190618 1)、(20190619 1) 和 (20190621 3)
    • 该要求未出现在您的原始问题中。
    • 是的,对不起
    【解决方案2】:

    这是在 BigQuery 中使用 GENERATE_DATE_ARRAY 函数的可能解决方案。

    with data as (
       SELECT parse_date('%Y%m%d', event_date) AS event_date, COUNT(DISTINCT user_pseudo_id) AS user_count
       FROM `mark-3314e.analytics_197261162.events_*`  
       WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)) AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
       AND event_name = 'session_start'
       GROUP BY event_date
       ORDER BY event_date ASC
    )
    
    select dt as event_date, user_count from (
      select user_count,
          if(
            previousdate is null, 
            generate_date_array(date, date_sub(nextdate, interval 1 day), interval 1 day), 
            generate_date_array(date, if(nextdate is null, date, date_sub(nextdate, interval 1 day)), interval 1 day)
          ) as dates 
      from (
              select 
                lag(event_date) over(order by event_date) as previousdate,
                event_date as date,
                lead(event_date) over(order by event_date) as nextdate,
                user_count
              from data
          )
    ), unnest(dates) dt
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2020-07-13
      • 2018-06-24
      • 1970-01-01
      • 1970-01-01
      • 2018-10-05
      • 2019-06-26
      • 1970-01-01
      相关资源
      最近更新 更多