【问题标题】:Group by time with interval in SQL BigQuery在 SQL BigQuery 中按时间和间隔分组
【发布时间】:2021-01-03 09:04:04
【问题描述】:

我有需要按时间分组的数据,间隔为 2 分钟。我的数据如下所示:

id            time             action_name            url
111      2020-09-01-09:19:00     First           www.stackoverflow/a12345
111      2020-09-01-09:19:04     Midpoint        www.stackoverflow/a12345
111      2020-09-01-09:19:08     Third           www.stackoverflow/a12345
112      2020-09-01-10:12:05     First           www.someotherurl/a111111
111      2020-09-01-12:36:54     First           www.stackoverflow/a12345
111      2020-09-01-12:36:58     Midpoint        www.stackoverflow/a12345
111      2020-09-01-12:37:03     Third           www.stackoverflow/a12345
111      2020-09-01-12:37:09     Complete        www.stackoverflow/a12345
222      2020-09-01-15:17:44     First           www.stackoverflow/a2222
222      2020-09-01-15:17:48     Midpoint        www.stackoverflow/a2222
222      2020-09-01-15:18:05     Third           www.stackoverflow/a2222

我需要获取具有以下条件的数据:如果x_idx_urlaction_name 列具有Complete 值,则获取该值。如果它没有Complete,则获取Third,依此类推。我目前拥有的代码每个x_idx_url 只返回一行。因此,我不仅需要按idurl 对数据进行分组,还需要按时间对数据进行分组,间隔为2 minties。下面是代码:

SELECT AS VALUE 
  ARRAY_AGG(current_query_result 
    ORDER BY CASE action_name
      WHEN 'Complete' THEN 1
      WHEN 'Third' THEN 2
      WHEN 'Midpoint' THEN 3
      WHEN 'First' THEN 4
    END
    LIMIT 1
  )[OFFSET(0)] 
FROM (
  SELECT
    c.time,
    c.id,
    c.action_name, 
    c.url
  FROM `bq_table` c
  WHERE c.action_name in ('First', 'Midpoint', 'Third', 'Complete')
) current_query_result
GROUP BY id, url

期望的输出是:

id            time             action_name            url
111      2020-09-01-09:19:08     Third           www.stackoverflow/a12345
112      2020-09-01-10:12:05     First           www.someotherurl/a111111
111      2020-09-01-12:37:09     Complete        www.stackoverflow/a12345
222      2020-09-01-15:18:05     Third           www.stackoverflow/a2222

我试过这个:TIMESTAMP_SECONDS(2*60 * DIV(UNIX_SECONDS(c.time), 2*60)) timekey但出现错误:No matching signature for function UNIX_SECONDS for argument types: STRING. Supported signature: UNIX_SECONDS(TIMESTAMP)

【问题讨论】:

    标签: sql time group-by google-bigquery timestamp


    【解决方案1】:

    以下是 BigQuery 标准 SQL

    #standardSQL
    SELECT 
      AS VALUE ARRAY_AGG(t 
        ORDER BY STRPOS('First,Midpoint,Third,Complete',action_name) DESC 
        LIMIT 1
      )[OFFSET(0)]
    FROM `project.dataset.bq_table` t
    WHERE action_name IN ('First', 'Midpoint', 'Third', 'Complete')
    GROUP BY id, url, 
      TIMESTAMP_SUB(
        PARSE_TIMESTAMP('%Y-%m-%d-%H:%M:%S', time), 
        INTERVAL MOD(UNIX_SECONDS(PARSE_TIMESTAMP('%Y-%m-%d-%H:%M:%S', time)), 2 * 60) 
        SECOND
      )   
    

    您可以使用您问题中的示例数据进行测试,如以下示例所示

    #standardSQL
    WITH `project.dataset.bq_table` AS (
      SELECT 111 id, '2020-09-01-09:19:00' time, 'First' action_name, 'www.stackoverflow/a12345' url UNION ALL
      SELECT 111, '2020-09-01-09:19:04', 'Midpoint', 'www.stackoverflow/a12345' UNION ALL
      SELECT 111, '2020-09-01-09:19:08', 'Third', 'www.stackoverflow/a12345' UNION ALL
      SELECT 112, '2020-09-01-10:12:05', 'First', 'www.someotherurl/a111111' UNION ALL
      SELECT 111, '2020-09-01-12:36:54', 'First', 'www.stackoverflow/a12345' UNION ALL
      SELECT 111, '2020-09-01-12:36:58', 'Midpoint', 'www.stackoverflow/a12345' UNION ALL
      SELECT 111, '2020-09-01-12:37:03', 'Third', 'www.stackoverflow/a12345' UNION ALL
      SELECT 111, '2020-09-01-12:37:09', 'Complete', 'www.stackoverflow/a12345' 
    )
    SELECT 
      AS VALUE ARRAY_AGG(t 
        ORDER BY STRPOS('First,Midpoint,Third,Complete',action_name) DESC 
        LIMIT 1
      )[OFFSET(0)]
    FROM `project.dataset.bq_table` t
    WHERE action_name IN ('First', 'Midpoint', 'Third', 'Complete')
    GROUP BY id, url, 
      TIMESTAMP_SUB(
        PARSE_TIMESTAMP('%Y-%m-%d-%H:%M:%S', time), 
        INTERVAL MOD(UNIX_SECONDS(PARSE_TIMESTAMP('%Y-%m-%d-%H:%M:%S', time)), 2 * 60) 
        SECOND
      )   
    

    有输出

    Row     id      time                    action_name     url  
    1       111     2020-09-01-09:19:08     Third           www.stackoverflow/a12345     
    2       112     2020-09-01-10:12:05     First           www.someotherurl/a111111     
    3       111     2020-09-01-12:37:09     Complete        www.stackoverflow/a12345    
    

    【讨论】:

    • 我刚刚编辑了这个问题。我已将id 222 添加到数据和所需的输出中。现在的代码返回Midpoint(时间:2020-09-01-15:17:48)和Third(时间:2020-09-01-15:18:05),实际上我只需要返回Third。所以 2 分钟的间隔在这里似乎不起作用,或者更确切地说,这种方法不是我针对特定问题所需要的。如何过滤 CHAIN FirstMidpoint 等的 1 或 2 分钟间隔...非常感谢您的帮助!
    • 基于您在问题的I have tried this 部分中显示的内容-我假设您只想每 2 分钟分组一次-我现在看到的显然不是您的意思-我会推荐您发布新问题并非常清楚地确定您的用例 - 所以不会有误解
    • 是的,我也在想同样的事情,这是一个全新的问题。我会发一个。虽然我的帖子中确实意味着 2 分钟的间隔,但我刚刚发现,这不是我的数据所需要的,你知道它是怎么回事 :) 谢谢你的时间。
    【解决方案2】:

    我想你已经很接近解决它了,你只需要使用 PARSE_TIMESTAMP 将字符串转换为 TIMESTAMP 类型,例如

    SELECT PARSE_TIMESTAMP('%Y-%m-%d-%H:%M:%S', '2020-09-01-09:19:00')
    

    输出:

    +---------------------+
    |         f0_         |
    +---------------------+
    | 2020-09-01 09:19:00 |
    +---------------------+
    

    【讨论】:

      猜你喜欢
      • 2021-01-03
      • 2016-09-21
      • 2021-01-05
      • 2019-08-07
      • 2017-03-22
      • 2016-10-04
      • 2021-11-15
      • 2011-12-20
      • 1970-01-01
      相关资源
      最近更新 更多