【问题标题】:Bigquery, how to Multiple-fact, Multiple-grain Query on Conformed DimensionsBigquery,如何对一致维度进行多事实、多粒度查询
【发布时间】:2019-11-02 03:42:17
【问题描述】:

我有三个事实表

预算:类别、商品、预算时间

实际:类别、商品、日期、实际时间

基线:类别、商品、日期、预测时间

我想编写一个查询来返回预算小时数、实际小时数、按类别分组的预测小时数和按日期过滤的商品的总和。

请注意这三个事实具有不同级别的细节,为简单起见,我删除了另一个不常见的维度 目前我在 BigQuery 的 Datastudio 中使用此查询

with t0 as ( select category, commodity FROM `testing-bi-engine.starschema.budget`
             union distinct
             select category, commodity FROM `testing-bi-engine.starschema.actual`
             union distinct
             select category, commodity FROM `testing-bi-engine.starschema.baseline`)
SELECT t0.category, t0.commodity , sum(t2.actualhours) as actualhours , sum(t3.budgethours) as budgethours , sum(t4.forecast) as forecasthours FROM t0
left outer join
(SELECT category, commodity , sum(actualhours) as actualhours FROM `testing-bi-engine.starschema.actual`
WHERE date <= PARSE_DATE('%Y%m%d', @DS_END_DATE)
group by category, commodity) t2
on t0.category= t2.category and t0.commodity= t2.commodity
left outer join
(SELECT category, commodity , sum(budgethours) as budgethours FROM `testing-bi-engine.starschema.budget`
group by category, commodity) t3
on t0.category= t3.category and t0.commodity= t3.commodity
left outer join
(SELECT category, commodity , sum(forecast) as forecast FROM `testing-bi-engine.starschema.baseline`
  WHERE date <= PARSE_DATE('%Y%m%d', @DS_END_DATE)
group by category, commodity) t4
on t0.category= t4.category and t0.commodity= t4.commodity
group by t0.category, t0.commodity

这是一个典型的星型模式,有多个事实表

我的问题是有更好的方法来编写这个查询吗?

【问题讨论】:

    标签: google-bigquery google-data-studio


    【解决方案1】:

    有没有更好的方法来编写这个查询?

    试试下面:

    重构 - 第 1 轮

    SUMs 删除了不必要的(最外层的)GROUP BY 并将冗长的ON 替换为更紧凑的USING

    #standardSQL
    WITH t0 AS ( 
      SELECT category, commodity FROM `testing-bi-engine.starschema.budget` UNION DISTINCT
      SELECT category, commodity FROM `testing-bi-engine.starschema.actual` UNION DISTINCT
      SELECT category, commodity FROM `testing-bi-engine.starschema.baseline`
    )
    SELECT category, commodity, 
      actualhours , 
      budgethours , 
      forecast 
    FROM t0 LEFT OUTER JOIN (
      SELECT category, commodity , SUM(actualhours) AS actualhours 
      FROM `testing-bi-engine.starschema.actual`
      WHERE date <= PARSE_DATE('%Y%m%d', @DS_END_DATE)
      GROUP BY category, commodity
    ) t2 USING(category, commodity)
    LEFT OUTER JOIN (
      SELECT category, commodity , SUM(budgethours) AS budgethours 
      FROM `testing-bi-engine.starschema.budget`
      GROUP BY category, commodity
    ) t3 USING(category, commodity)
    LEFT OUTER JOIN (
      SELECT category, commodity , SUM(forecast) AS forecast 
      FROM `testing-bi-engine.starschema.baseline`
      WHERE date <= PARSE_DATE('%Y%m%d', @DS_END_DATE)
      GROUP BY category, commodity
    ) t4 USING(category, commodity)
    

    重构 - 第 2 轮

    删除了 t0,因为它并不是真正需要的,因此将 LEFT OUTER 替换为 FULL OUTER

    #standardSQL
    SELECT category, commodity, 
      actualhours , 
      budgethours , 
      forecast 
    FROM (
      SELECT category, commodity , SUM(actualhours) AS actualhours 
      FROM `testing-bi-engine.starschema.actual`
      WHERE date <= PARSE_DATE('%Y%m%d', @DS_END_DATE)
      GROUP BY category, commodity
    ) t2 
    FULL OUTER JOIN (
      SELECT category, commodity , SUM(budgethours) AS budgethours 
      FROM `testing-bi-engine.starschema.budget`
      GROUP BY category, commodity
    ) t3 USING(category, commodity)
    FULL OUTER JOIN (
      SELECT category, commodity , SUM(forecast) AS forecast 
      FROM `testing-bi-engine.starschema.baseline`
      WHERE date <= PARSE_DATE('%Y%m%d', @DS_END_DATE)
      GROUP BY category, commodity
    ) t4 USING(category, commodity)
    

    【讨论】:

    • 你介意解释一下你做了什么吗?如果我的原始查询没有优化?
    • 当然。用 cmets 更新了我对所做工作的回答
    • 对不起,我没有使用太多 SQL,在第一个#standardSQL SELECT 类别商品中,如何通过 SQL 获取类别和商品的不同值,因为值并不总是相同?
    • 因为所有连接的部分已经按类别、商品区分。所以你不需要区分已经不同的输出
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-10-01
    • 1970-01-01
    • 2019-02-10
    • 1970-01-01
    • 2017-08-30
    • 1970-01-01
    相关资源
    最近更新 更多