【问题标题】:SQL / Presto SQL: sum by group in a same columnSQL / Presto SQL:在同一列中按组求和
【发布时间】:2021-10-20 21:35:07
【问题描述】:

我正在尝试解决如下问题:

有一张这样的表:

logtime name seconds flag
1629302433 a 30 1-1
1629302463 a 30 1-1
1629302483 a 20 0-1
1629302513 a 30 1-1
1629302533 a 20 0-1
1629302553 a 30 1-1

当 flag = 0-1 时,数据将被分成 3 部分,并按每个部分求和秒列值,如下所示: (logtime 是时间戳)

name seconds
a 60
a 30
a 30

【问题讨论】:

  • 您使用的是 MySQL 还是 Hive?此外,您可能想在这里更好地解释您的逻辑。
  • 我实际上在使用 presto sql

标签: sql hive presto


【解决方案1】:

将每行所属的组数计算为标志“0-1”出现次数的运行总和。然后按名称和组号聚合组。

演示:

with mytable as (
SELECT * FROM (
    VALUES
(1629302433, 'a', 30, '1-1'),
(1629302463, 'a', 30, '1-1'),
(1629302483, 'a', 20, '0-1'),
(1629302513, 'a', 30, '1-1'),
(1629302533, 'a', 20, '0-1'),
(1629302553, 'a', 30, '1-1')
) AS t (logtime, name, seconds, flag)
)

select name, 
       sum(seconds) seconds
from
(--calculate group number as running sum of 0-1 occurances
select logtime, name, seconds, flag,
       sum(case when flag='0-1' then 1 else 0 end) over(partition by name order by logtime) as group_nbr
  from mytable
)s
where flag='1-1' --do not sum '0-1' records
group by name, group_nbr 
order by name, group_nbr --remove ordering if not necessary

结果:

name    seconds 
a       60
a       30
a       30

【讨论】:

    【解决方案2】:

    您可以使用lag() 函数查找值变化的位置,然后进行累积求和以分配组,然后对组求和:

    WITH dataset AS (
      SELECT * 
      FROM 
        (
          VALUES 
            (1629302433,    'a',    30, '1-1'),
            (1629302463,    'a',    30, '1-1'),
            (1629302483,    'a',    20, '0-1'),
            (1629302513,    'a',    30, '1-1'),
            (1629302533,    'a',    20, '0-1'),
            (1629302553,    'a',    30, '1-1')
        ) AS t (logtime,    name,   seconds,    flag)
    ) 
    
    select name, sum(seconds) seconds
    from (
             select *,
                    sum(case when flag = prev_flag then 0 else 1 end) over (partition by name order by logtime) as grp
             from (
                      select logtime,
                             name,
                             seconds,
                             flag,
                             lag(flag) over (partition by name order by logtime) as prev_flag
                      from dataset
                  )
         )
    where flag = '1-1'
    group by name, grp
    

    输出:

    name seconds
    a 60
    a 30
    a 30

    【讨论】:

      最近更新 更多