【问题标题】:Clickhouse moving averageClickhouse移动平均线
【发布时间】:2019-04-24 06:51:56
【问题描述】:

输入: 点击屋

表 A business_dttm(日期时间) 金额(浮动)

我需要在每个 business_dttm 上计算 15 分钟(或最后 3 条记录)的移动总和

例如

amount business_dttm     moving sum
0.3 2018-11-19 13:00:00  
0.3 2018-11-19 13:05:00
0.4 2018-11-19 13:10:00  1
0.5 2018-11-19 13:15:00  1.2
0.6 2018-11-19 13:15:00  1.5
0.7 2018-11-19 13:20:00  1.8
0.8 2018-11-19 13:25:00  2.1
0.9 2018-11-19 13:25:00  2.4
0.5 2018-11-19 13:30:00  2.2

很遗憾,我们在 Clickhouse 中没有窗口函数和无条件加入

如果没有交叉连接和条件,我该怎么做?

【问题讨论】:

    标签: moving-average clickhouse


    【解决方案1】:

    如果窗口大小非常小,你可以这样做

    SELECT
        sum(window.2) AS amount,
        max(dttm) AS business_dttm,
        sum(amt) AS moving_sum
    FROM
    (
        SELECT
            arrayJoin([(rowNumberInAllBlocks(), amount), (rowNumberInAllBlocks() + 1, 0), (rowNumberInAllBlocks() + 2, 0)]) AS window,
            amount AS amt,
            business_dttm AS dttm
        FROM
        (
            SELECT
                amount,
                business_dttm
            FROM A
            ORDER BY business_dttm
        )
    )
    GROUP BY window.1
    HAVING count() = 3
    ORDER BY window.1;
    

    前两行将被忽略,因为 ClickHouse 不会将聚合折叠为 null。您可以稍后添加它们。

    更新:

    仍然可以计算任意窗口大小的移动和。根据需要调整 window_size(本例中为 3)。

    -- Note, rowNumberInAllBlocks is incorrect if declared inside with block due to being stateful
    WITH
        (
            SELECT arrayCumSum(groupArray(amount))
            FROM
            (
                SELECT
                    amount
                FROM A
                ORDER BY business_dttm
            )
        ) AS arr,
        3 AS window_size
    SELECT
        amount,
        business_dttm,
        if(rowNumberInAllBlocks() + 1 < window_size, NULL, arr[rowNumberInAllBlocks() + 1] - arr[rowNumberInAllBlocks() + 1 - window_size]) AS moving_sum
    FROM
    (
        SELECT
            amount,
            business_dttm
        FROM A
        ORDER BY business_dttm
    )
    

    或者这个变种

    SELECT
        amount,
        business_dttm,
        moving_sum
    FROM
    (
        WITH 3 AS window_size
        SELECT
            groupArray(amount) AS amount_arr,
            groupArray(business_dttm) AS business_dttm_arr,
            arrayCumSum(amount_arr) AS amount_cum_arr,
            arrayMap(i -> if(i < window_size, NULL, amount_cum_arr[i] - amount_cum_arr[(i - window_size)]), arrayEnumerate(amount_cum_arr)) AS moving_sum_arr
        FROM
        (
            SELECT *
            FROM A
            ORDER BY business_dttm ASC
        )
    )
    ARRAY JOIN
        amount_arr AS amount,
        business_dttm_arr AS business_dttm,
        moving_sum_arr AS moving_sum
    

    公平的警告,这两种方法都远非最佳,但它展示了 ClickHouse 超越 SQL 的独特功能。

    【讨论】:

    • 不幸的是,窗口大小 ~ 10000 行
    • 感谢您的回答,但请稍等。我说的是移动和,而不是累积和。移动和是真的吗?
    【解决方案2】:

    version 21.4开始添加了对窗口函数的完整支持。此时它被标记为实验性功能

    SELECT
        amount,
        business_dttm,
        sum(amount) OVER (ORDER BY business_dttm ASC ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS sum
    FROM (
        SELECT data.1 AS amount, toDateTime(data.2) AS business_dttm
        FROM (
            SELECT arrayJoin([
                (0.3, '2018-11-19 13:00:00'),  
                (0.3, '2018-11-19 13:05:00'),
                (0.4, '2018-11-19 13:10:00'),
                (0.5, '2018-11-19 13:15:00'),
                (0.6, '2018-11-19 13:15:00'),
                (0.7, '2018-11-19 13:20:00'),
                (0.8, '2018-11-19 13:25:00'),
                (0.9, '2018-11-19 13:25:00'),
                (0.5, '2018-11-19 13:30:00')]) data)
        )
    SETTINGS allow_experimental_window_functions = 1
    
    /*
    ┌─amount─┬───────business_dttm─┬────────────────sum─┐
    │    0.3 │ 2018-11-19 13:00:00 │                0.3 │
    │    0.3 │ 2018-11-19 13:05:00 │                0.6 │
    │    0.4 │ 2018-11-19 13:10:00 │                  1 │
    │    0.5 │ 2018-11-19 13:15:00 │                1.2 │
    │    0.6 │ 2018-11-19 13:15:00 │                1.5 │
    │    0.7 │ 2018-11-19 13:20:00 │                1.8 │
    │    0.8 │ 2018-11-19 13:25:00 │ 2.0999999999999996 │
    │    0.9 │ 2018-11-19 13:25:00 │                2.4 │
    │    0.5 │ 2018-11-19 13:30:00 │                2.2 │
    └────────┴─────────────────────┴────────────────────┘
    */
    

    https://altinity.com/blog/clickhouse-window-functions-current-state-of-the-art

    【讨论】:

      猜你喜欢
      • 2017-09-01
      • 2013-12-22
      • 2022-01-26
      • 2011-06-29
      • 2012-05-24
      • 2021-01-18
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多