【问题标题】:aggregate QUERY - 3 month average per person for every month合计 QUERY - 每人每月平均 3 个月
【发布时间】:2017-11-01 13:42:25
【问题描述】:

假设我在 bigquery 中查看数据

Person | Amount | yearMonth
---------------------------
AA     |   100  |   201701
AA     |   200  |   201702
AA     |   300  |   201703
AA     |   70   |   201704
AB     |   10   |   201701
AB     |   50   |   201702
AB     |   60   |   201703
AB     |   70   |   201704
AC     |   70   |   201701
AC     |   80   |   201702
AC     |   30   |   201703
AC     |   10   |   201704

现在,我需要得到每个人每个月过去 3 个月的平均值

预期结果:

Person | Amount | yearMonth
---------------------------
AA     |   200  |   201703(avg of 201701-201703)
AA     |   190  |   201704(avg of 201702-201704)
AB     |   40   |   201703(avg of 201701-201703)
AB     |   60   |   201704(avg of 201702-201704)
AC     |   60   |   201703(avg of 201701-201703)
AC     |   40   |   201704(avg of 201702-201704)

这是如何计算的?

第一行

  • AA = 200,来自 100(201701)+200(201702)+300(201703)/3 = 200
  • AA = 100,来自 200(201702)+300(201703)+70(201704)/3 = 190
  • AB = 40,来自 10(201701)+50(201702)+60(201703)/3 = 40
  • 很快

我不确定如何按此分组。我不介意您的回答是否包含此问题的链接。

谢谢大家

在旧版 SQL 中也可以吗?我还没有迁移到标准 SQL。 我的视图在旧版 SQL 中

【问题讨论】:

  • 如果您还需要帮助将视图迁移到标准 SQL,请考虑发布另一个问题。

标签: google-bigquery aggregate


【解决方案1】:

以下是 BigQuery 标准 SQL(至少应该让您了解正确分组的逻辑)

#standardSQL
SELECT
  person, yearMonth, CAST(amount AS INT64) amount
FROM (
  SELECT
    person, yearMonth, dt,
    AVG(amount) OVER(PARTITION BY person ORDER BY dt RANGE BETWEEN 63 PRECEDING AND CURRENT row) amount,
    COUNT(1) OVER(PARTITION BY person ORDER BY dt RANGE BETWEEN 63 PRECEDING AND CURRENT row) months
  FROM (
    SELECT 
      person, amount, yearMonth, 
      UNIX_DATE(DATE(DIV(yearMonth, 100), MOD(yearMonth, 100), 1)) AS dt
    FROM `project.dataset.table`
  )
)
WHERE months = 3
-- ORDER BY person, yearMonth

您可以使用如下的虚拟数据测试/玩它

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'AA' person, 100 amount, 201701 yearMonth UNION ALL
  SELECT 'AA', 200, 201702 UNION ALL
  SELECT 'AA', 300, 201703 UNION ALL
  SELECT 'AA', 70, 201704 UNION ALL
  SELECT 'AB', 10, 201701 UNION ALL
  SELECT 'AB', 50, 201702 UNION ALL
  SELECT 'AB', 60, 201703 UNION ALL
  SELECT 'AB', 70, 201704 UNION ALL
  SELECT 'AC', 70, 201701 UNION ALL
  SELECT 'AC', 80, 201702 UNION ALL
  SELECT 'AC', 30, 201703 UNION ALL
  SELECT 'AC', 10, 201704 
)
SELECT
  person, yearMonth, CAST(amount AS INT64) amount
FROM (
  SELECT
    person, yearMonth, dt,
    AVG(amount) OVER(PARTITION BY person ORDER BY dt RANGE BETWEEN 63 PRECEDING AND CURRENT row) amount,
    COUNT(1) OVER(PARTITION BY person ORDER BY dt RANGE BETWEEN 63 PRECEDING AND CURRENT row) months
  FROM (
    SELECT 
      person, amount, yearMonth, 
      UNIX_DATE(DATE(DIV(yearMonth, 100), MOD(yearMonth, 100), 1)) AS dt
    FROM `project.dataset.table`
  )
)
WHERE months = 3
ORDER BY person, yearMonth

输出符合预期

person  yearMonth   amount   
AA      201703      200  
AA      201704      190  
AB      201703      40   
AB      201704      60   
AC      201703      60   
AC      201704      40    

为 BigQuery Legacy SQL 添加了版本

#legacySQL
SELECT
  person, yearMonth, INTEGER(amount) amount
FROM (
  SELECT
    person, yearMonth, dt,
    AVG(amount) OVER(PARTITION BY person ORDER BY dt range BETWEEN 63*60*60*24 preceding AND current row) amount,
    COUNT(1) OVER(PARTITION BY person ORDER BY dt range BETWEEN 63*60*60*24 preceding AND current row) months
  FROM (
    SELECT 
      person, amount, yearMonth, 
      TIMESTAMP_TO_SEC(TIMESTAMP(CONCAT(STRING(INTEGER(yearMonth/100)), '-', SUBSTR(STRING(100 + yearMonth % 100), 2, 2), '-01'))) AS dt
    FROM [project:dataset.table]
  )
)
WHERE months = 3
-- ORDER BY person, yearMonth

您可以使用下面的示例和虚拟数据来测试/玩它

#legacySQL
SELECT
  person, yearMonth, INTEGER(amount) amount
FROM (
  SELECT
    person, yearMonth, dt,
    AVG(amount) OVER(PARTITION BY person ORDER BY dt range BETWEEN 63*60*60*24 preceding AND current row) amount,
    COUNT(1) OVER(PARTITION BY person ORDER BY dt range BETWEEN 63*60*60*24 preceding AND current row) months
  FROM (
    SELECT 
      person, amount, yearMonth, 
      TIMESTAMP_TO_SEC(TIMESTAMP(CONCAT(STRING(INTEGER(yearMonth/100)), '-', SUBSTR(STRING(100 + yearMonth % 100), 2, 2), '-01'))) AS dt
    FROM -- [project:dataset.table]
      (SELECT 'AA' person, 100 amount, 201701 yearMonth),
      (SELECT 'AA' person, 200 amount, 201702 yearMonth),
      (SELECT 'AA' person, 300 amount, 201703 yearMonth),
      (SELECT 'AA' person, 70 amount, 201704 yearMonth),
      (SELECT 'AB' person, 10 amount, 201701 yearMonth),
      (SELECT 'AB' person, 50 amount, 201702 yearMonth),
      (SELECT 'AB' person, 60 amount, 201703 yearMonth),
      (SELECT 'AB' person, 70 amount, 201704 yearMonth),
      (SELECT 'AC' person, 70 amount, 201701 yearMonth),
      (SELECT 'AC' person, 80 amount, 201702 yearMonth),
      (SELECT 'AC' person, 30 amount, 201703 yearMonth),
      (SELECT 'AC' person, 10 amount, 201704 yearMonth)
  )
)
WHERE months = 3
ORDER BY person, yearMonth

【讨论】:

  • 感谢您的努力。也感谢提供标准SQL。这让我很清楚。
  • 如果你不介意我问,什么是 "63*60*60*24" ?我假设 60*60*24 = 1 天时间戳,但不确定 63 是什么。无论如何,有一个需求变化我需要排除当月,例如,201704 表示 201701 到 201703 的平均值,所以我正在寻找一种方法来理解它并修改它。
  • 使用 63 天作为范围确保我们计算过去 3 个月的平均值。这个数字可以是 62 到 80+ 之间的任何值
  • 我明白了.. 谢谢大家
猜你喜欢
  • 2019-03-03
  • 2022-11-09
  • 1970-01-01
  • 2016-10-02
  • 1970-01-01
  • 1970-01-01
  • 2019-04-19
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多