【问题标题】:How to calculate cumulative product in clickhouse如何计算clickhouse中的累积产品
【发布时间】:2020-11-21 11:04:54
【问题描述】:

在python中计算累积积,我可以使用numpy.cumprod

>>> a = [2, 3, 4, 5]
>>> numpy.cumprod(a)
[2, 6, 24, 120]
# this is result i want.  [2, 3, 4, 5] => [2, 2*3, 2*3*4, 2*3*4*5] => [2, 6, 24, 120]

但我不知道如何在 CLICKHOUSE
中编写 sql 表A:

row rate
1    2
2    3
3    4
4    5

列率是我想要的结果,如何使用CLICKHOUSE SQL语句实现。

row rate
1    2
2    6
3    24
4    120

【问题讨论】:

    标签: sql clickhouse


    【解决方案1】:

    没有简单的方法。我会在 CH arrayCumProd 中添加(实现)一个新函数

    SELECT
        i, pow(2,arraySum(z->log2(z),n))
    FROM
    (
        SELECT
            ig,
            arrayMap( i -> arraySlice(ng, 1, i), arrayEnumerate(groupArray(x) AS ng) as ig) xx
        FROM ( SELECT arrayJoin([2, 3, 4, 5]) AS x )
    )
    ARRAY JOIN
        ig AS i,
        xx AS n
    
    ┌─i─┬─pow(2, arraySum(lambda(tuple(z), log2(z)), n))─┐
    │ 1 │                                              2 │
    │ 2 │                                              6 │
    │ 3 │                                             24 │
    │ 4 │                             119.99999999999994 │
    └───┴────────────────────────────────────────────────┘
    

    嗯,看来是我把它弄复杂了。

    SELECT x FROM
    (
        SELECT
            arrayMap(i -> pow(2,i), arrayCumSum(groupArray(log2(x)))) z
        FROM ( SELECT arrayJoin([2, 3, 4, 5]) AS x )
    )
    ARRAY JOIN z as x
    

    【讨论】:

      【解决方案2】:

      我只是扩展了@Denis Zhuravlev 的答案。

      CH 没有专门的函数来计算累积乘法(或任何除加法之外的任意数学运算符),此外,由于这种计算的“累积”性质,无法应用现有函数来获得所需的结果。

      因此需要使用对数将乘法转换为加法:

      loga x*y = loga x + loga y

      x*y = a(loga x + loga y)

      SELECT r.1.1 row, r.1.2 rate, r.2 value, round(r.2, 2) rounded_value
      FROM (
        SELECT 
          groupArray((row, rate, rate_log)) data,
          arrayMap(log -> exp10(log), arrayCumSum(data_item -> data_item.3, data)) rate_cumulative_values,
          arrayJoin(arrayZip(data, rate_cumulative_values)) r  
        FROM (
          SELECT row, rate, log10(rate) AS rate_log
          FROM (
            /* emulate the origin dataset */
            SELECT data.1 row, data.2 rate
            FROM (SELECT arrayJoin([
              (1, 2), (2, 3), (3, 4), (4, 5),
              (5, 1), (6, 0), (7, -1)]) AS data))
          ORDER BY row));
      /*
      ┌─row─┬─rate─┬──────────────value─┬─rounded_value─┐
      │   1 │    2 │                  2 │             2 │
      │   2 │    3 │                  6 │             6 │
      │   3 │    4 │ 23.999999999999993 │            24 │
      │   4 │    5 │ 119.99999999999996 │           120 │
      │   5 │    1 │ 119.99999999999996 │           120 │
      │   6 │    0 │                  0 │             0 │
      │   7 │   -1 │                nan │           nan │
      └─────┴──────┴────────────────────┴───────────────┘
      */
      

      同样的逻辑可以应用于计算累加除法:

      x/y = a(loga x - loga y)

      累计除法:

      SELECT r.1.1 row, r.1.2 rate, r.2 value, round(r.2, 2) rounded_value
      FROM (
        SELECT 
          groupArray((row, rate, rate_log)) data,
          arrayMap(log -> exp10(log), arrayCumSum((data_item, index) -> index = 1 ? data_item.3 : - data_item.3, data, arrayEnumerate(data))) rate_cumulative_values,
          arrayJoin(arrayZip(data, rate_cumulative_values)) r  
        FROM (
          SELECT row, rate, log10(rate) AS rate_log
          FROM (
            /* emulate the origin dataset */
            SELECT data.1 row, data.2 rate
            FROM (SELECT arrayJoin([
              (1, 100), (2, 2), (3, 10), (4, 2)]) AS data))
          ORDER BY row));
      /*
      ┌─row─┬─rate─┬──────────────value─┬─rounded_value─┐
      │   1 │  100 │                100 │           100 │
      │   2 │    2 │  49.99999999999999 │            50 │
      │   3 │   10 │  4.999999999999999 │             5 │
      │   4 │    2 │ 2.4999999999999996 │           2.5 │
      └─────┴──────┴────────────────────┴───────────────┘
      */
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2021-07-21
        • 1970-01-01
        • 1970-01-01
        • 2019-03-30
        • 1970-01-01
        • 2013-05-06
        • 2016-12-27
        • 2017-03-07
        相关资源
        最近更新 更多