【问题标题】:How to sum array and other data?如何对数组和其他数据求和?
【发布时间】:2019-05-04 08:25:24
【问题描述】:

如何使用简单的 SQL 来获得相同的结果?

我有两张这样的桌子。

create table t1_before
(
  k1 String,
  ts DateTime,
  span  Int32,
  iserror  Int32
)
ENGINE = MergeTree()
ORDER BY (k1, ts)
;
insert into t1_before values('key1','2019-05-04 10:00:00',1,0);
insert into t1_before values('key1','2019-05-04 10:00:00',1,0);
insert into t1_before values('key1','2019-05-04 10:00:00',1,1);
insert into t1_before values('key1','2019-05-04 10:00:00',2,0);
insert into t1_before values('key1','2019-05-04 10:00:00',2,0);
insert into t1_before values('key1','2019-05-04 10:00:00',2,1);
insert into t1_before values('key1','2019-05-04 10:00:00',2,1);
insert into t1_before values('key1','2019-05-04 10:00:00',2,1);
create table t1
(
  k1 String,  
  ts DateTime, 
  totalspan  Int32,  
  maxspan  Int32, 
  totalcount  Int32,   
  errorcount Int32, 
  goal Nested    
    (
        m UInt32,  
        n UInt32
)
)
ENGINE = MergeTree()
ORDER BY (k1, ts)
;

表 t1 是由 t1_before 聚合的。目标.m 是跨度,目标.n 是计数。 t1_before 中的数据交换到 t1。 像这样:

insert into t1 values('key1','2019-05-04 10:00:00', 13, 2, 7, 2, [1,2],[3,5]);

t1_before 有太多行,所以实际上我只有表 t1。

如果数据是

insert into t1 values('key1','2019-05-04 10:00:00', 13, 2, 7, 2, [1,2],[3,5]);
insert into t1 values('key1','2019-05-04 10:00:20', 25, 4, 8, 3, [1,2,4],[1,2,5]);
insert into t1 values('key1','2019-05-04 11:02:30', 13, 2, 8, 1, [1,2],[3,5]);
insert into t1 values('key2','2019-05-04 10:00:00', 13, 2, 8, 3, [1,2],[3,5]);
insert into t1 values('key2','2019-05-04 10:02:00', 13, 2, 8, 0, [1,2],[3,5]);

我知道如何得到结果,但是很复杂。

SELECT 
    d1.k1, d1.ts2, d1.a1, 
    d2.sumtotalspan, d2.maxtotalspan, d2.sumtotalcount, d2.sumerrorcount
FROM 
(
    SELECT 
        k1, ts2, quantilesExactWeighted(0.5, 0.9, 0.99)(m1, n1) AS a1
    FROM 
    (
        SELECT 
            k1, 
            toStartOfHour(ts) AS ts2, 
            goal.m AS m1, 
            sum(goal.n) AS n1
        FROM t1 
        ARRAY JOIN goal
        GROUP BY  k1, toStartOfHour(ts), goal.m
    ) 
    GROUP BY k1, ts2
) AS d1 
INNER JOIN 
(
    SELECT 
        k1, 
        toStartOfHour(ts) AS ts2, 
        sum(totalspan) AS sumtotalspan, 
        max(totalspan) AS maxtotalspan, 
        sum(totalcount) AS sumtotalcount, 
        sum(errorcount) AS sumerrorcount
    FROM t1 
    GROUP BY k1, toStartOfHour(ts)
) AS d2 ON (d1.k1 = d2.k1) AND (d1.ts2 = d2.ts2)

┌─k1┬─ts2─┬─a1─┬─sumtotalspan─┬─maxtotalspan─┬─sumtotalcount─┬sumerrorcount │ key1 │ 2019-05-04 10:00:00 │ [2,4,4] │ 38 │ 25 │ 15 │ 5 │

│key2 │ 2019-05-04 10:00:00 │ [2,2,2] │ 26 │ 13 │ 16 │ 3 │

│ key1 │ 2019-05-04 11:00:00 │ [2,2,2] │ 13 │ 13 │ 8 │ 1 │ └──────┴─────────────────────┴─────────┴──────

集合中的 3 行。

是否有任何简单的 SQL(删除连接)得到相同的结果? 像这样,但是错误:

SELECT 
            k1, 
            toStartOfHour(ts) AS ts2, 
sum(totalspan) AS sumtotalspan, 
        max(totalspan) AS maxtotalspan, 
        sum(totalcount) AS sumtotalcount, 
        sum(errorcount) AS sumerrorcount,
            quantilesExactWeighted(0.5, 0.9, 0.99)(sumMap(goal.m, goal.n))
        FROM t1 
        GROUP BY  k1, toStartOfHour(ts)

【问题讨论】:

    标签: sql clickhouse


    【解决方案1】:

    您可以尝试使用 goal.m/goal.n 和 arrayReduce 进行类似操作:

    SELECT arrayReduce('sumMap', [[1, 2, 3, 3]], [[4, 5, 6, 7]])
    FORMAT TSV
    
    ([1,2,3],[4,5,13])
    

    【讨论】:

      【解决方案2】:

      clickhouse 很大,功能可以组合。 在“聚合函数组合器https://clickhouse.yandex/docs/en/query_language/agg_functions/combinators/”中,我看到了“-Array”。所以我发现这个函数似乎得到了相同的结果: quantilesExactWeightedArray(0.5,0.9,0.99)(goal.m , goal.n)

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2016-08-07
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2022-01-22
        • 2022-06-10
        相关资源
        最近更新 更多