【问题标题】:SQL Bigquery convert timestamp into interval of 5 mintsSQL Bigquery将时间戳转换为5分钟的间隔
【发布时间】:2018-05-15 12:54:31
【问题描述】:

在 Google bigquery 数据库中, 将分钟级别的时间戳转换为 5 分钟的间隔。 5 分钟间隔是标准时间间隔。 以下只是我希望如何呈现数据的示例

test hd_count 2013-12-20 10:40:30 1 2013-12-20 10:41:30 3 2013-12-20 10:42:30 2 2013-12-20 10:43:30 1 2013-12-20 10:44:30 1

我希望将其表示为

test_1 test_2 hd_count 2013-12-20 10:40:30 2013-12-20 10:44:30 8

我已经在其他答案中研究过类似的请求,但它们似乎都不适用于大查询。任何帮助将不胜感激

【问题讨论】:

    标签: database google-bigquery


    【解决方案1】:

    以下是 BigQuery 标准 SQL

    #standardSQL
    WITH minmax AS (
      SELECT MIN(test) AS mintest, MAX(test) AS maxtest, 5 AS step
      FROM `project.dataset.table`
    ), intervals AS (
      SELECT 
        TIMESTAMP_ADD(mintest, INTERVAL step * num MINUTE) AS test1,
        TIMESTAMP_ADD(mintest, INTERVAL step * 60* (1 + num) - 1 SECOND) AS test2
      FROM minmax, 
      UNNEST(GENERATE_ARRAY(0,  DIV(TIMESTAMP_DIFF(maxtest, mintest, MINUTE) , step))) AS num
    )
    SELECT test1, test2, SUM(hd_count) AS hd_count
    FROM intervals JOIN `project.dataset.table`
    ON test BETWEEN test1 AND test2
    GROUP BY test1, test2
    

    您可以使用下面的虚拟数据测试/玩上面的内容

    #standardSQL
    WITH `project.dataset.table` AS (
      SELECT TIMESTAMP '2013-12-20 10:40:30' test, 1 hd_count UNION ALL
      SELECT TIMESTAMP '2013-12-20 10:41:30', 3 UNION ALL
      SELECT TIMESTAMP '2013-12-20 10:42:30', 2 UNION ALL
      SELECT TIMESTAMP '2013-12-20 10:43:30', 1 UNION ALL
      SELECT TIMESTAMP '2013-12-20 10:44:30', 1 UNION ALL
      SELECT TIMESTAMP '2013-12-20 10:45:30', 3 UNION ALL
      SELECT TIMESTAMP '2013-12-20 10:46:30', 2 UNION ALL
      SELECT TIMESTAMP '2013-12-20 10:47:30', 1  
    ), minmax AS (
      SELECT MIN(test) AS mintest, MAX(test) AS maxtest, 5 AS step
      FROM `project.dataset.table`
    ), intervals AS (
      SELECT 
        TIMESTAMP_ADD(mintest, INTERVAL step * num MINUTE) AS test1,
        TIMESTAMP_ADD(mintest, INTERVAL step * 60* (1 + num) - 1 SECOND) AS test2
      FROM minmax, 
      UNNEST(GENERATE_ARRAY(0,  DIV(TIMESTAMP_DIFF(maxtest, mintest, MINUTE) , step))) AS num
    )
    SELECT test1, test2, SUM(hd_count) AS hd_count
    FROM intervals JOIN `project.dataset.table`
    ON test BETWEEN test1 AND test2
    GROUP BY test1, test2
    ORDER BY test1   
    

    输出如下

    test1                       test2                       hd_count     
    2013-12-20 10:40:30 UTC     2013-12-20 10:45:29 UTC     8    
    2013-12-20 10:45:30 UTC     2013-12-20 10:50:29 UTC     6    
    

    【讨论】:

    • 根据这个和过去的问题,GENERATE_TIMESTAMP_ARRAY 似乎很有用。你也可以考虑filing a feature request。谢谢!
    【解决方案2】:

    这是一种标准的基于 SQL UDF 的方法,可实现高达毫秒精度的任意对齐。我在处理更精细的时间间隔时使用它:

    CREATE TEMPORARY FUNCTION bracketTimestampByMillis(ts TIMESTAMP, bracketMillis INT64) RETURNS TIMESTAMP AS (
      TIMESTAMP_MILLIS(CAST(FLOOR(
           (UNIX_MILLIS(ts) -  UNIX_MILLIS(TIMESTAMP_TRUNC(ts, DAY))) / bracketMillis) AS INT64) 
      * bracketMillis + UNIX_MILLIS(TIMESTAMP_TRUNC(ts, DAY))));
    

    为了演示,这是另一个 UDF,它使用第一个构建与不同间隔对齐的时间戳数组:

    CREATE TEMPORARY FUNCTION emitTimeBrackets(ts TIMESTAMP) RETURNS ARRAY<STRUCT<bracket STRING, tsVal TIMESTAMP>> AS (
        [STRUCT("exact" as bracket, ts as tsVal),
         STRUCT("minute", bracketTimestampByMillis(ts, 60 * 1000)),
         STRUCT("5 minute", bracketTimestampByMillis(ts, 5 * 60 * 1000)),
         STRUCT("15 minute", bracketTimestampByMillis(ts, 15 * 60 * 1000)),
         STRUCT("hour", bracketTimestampByMillis(ts, 60 * 60 * 1000)),
         STRUCT("quarter day", bracketTimestampByMillis(ts, 6 * 3600 * 1000))
        ]
    );
    
    SELECT emitTimeBrackets(CURRENT_TIMESTAMP()) as b
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2019-04-09
      • 1970-01-01
      • 1970-01-01
      • 2020-02-13
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多