【问题标题】:SQL Query - Sum of Aggregate above a ThresholdSQL 查询 - 超过阈值的聚合总和
【发布时间】:2021-02-19 05:59:05
【问题描述】:

假设我有这张桌子:

Name Month Date Value
n1 12 2020-12-01 7
n1 12 2020-12-05 9
n1 12 2020-12-09 17
n1 12 2020-12-14 8
n2 8 2020-08-02 12
n2 8 2020-08-08 7
n2 8 2020-08-09 14
n3 9 2020-09-01 5
n3 9 2020-09-03 11
n3 9 2020-09-07 10
n3 9 2020-09-21 7

什么是按名称和月份分组(按日期排序)的 SQL 查询,一旦 Value 字段的聚合总和大于 15,则将其选为结果的一行?每个名称的日期字段的值是唯一的。例如,所有具有 Name='n1' 的行都有不同的日期值。这也适用于 n2,n3,...。

根据上面的例子,结果应该如下:

Name Month Value
n1 12 16
n1 12 17
n2 8 19
n3 9 16
n3 9 17

谢谢

【问题讨论】:

  • 行是如何排序的?另外,您的数据库和版本是什么?
  • 不清楚如何得到这个输出,因为对于 n1 和 12,组的总值应该是 41,但你有 33。对于 n2 和 8,你有 19 而不是 33
  • @astentx 看起来这是一种带重置的运行总和,然后在每次重置时将其分组并求和,忽略结转
  • @all 我的道歉,我忘了添加日期列,想使它简单,但显然根本不清楚。再次为错误道歉
  • 另一个明显的错误是您没有告诉我们您使用的是什么数据库。您同时标记了 SQL Server 和 Oracle。这些是非常不同的数据库,它们的 SQL 方言之间存在显着差异。了解您的数据库版本也非常重要。我在 12 小时前问过这个问题,你错过了吗?

标签: sql sql-server oracle


【解决方案1】:

一种可能的方法是在 SQL Server 和 Oracle 中使用递归子查询。 您需要决定是否需要重置当前步骤的计算。

我仍然无法理解您从输出中删除 n1 12 2020-12-14n2 8 2020-08-09 的原因,但您可以通过从下面的示例中删除 and isleaf = 1 来做到这一点。

这是使用 cmets 的查询: (UPD:删除select ... from( select 并重写reset_flag 标准SQL 递归计算)

with a as (
  select 'n1' as name, 12 as month, convert(date, '2020-12-01', 23) as dt, 7 as val union all
  select 'n1' as name, 12 as month, convert(date, '2020-12-05', 23) as dt, 9 as val union all
  select 'n1' as name, 12 as month, convert(date, '2020-12-09', 23) as dt, 17 as val union all
  select 'n1' as name, 12 as month, convert(date, '2020-12-14', 23) as dt, 8 as val union all
  select 'n2' as name, 8  as month, convert(date, '2020-08-02', 23) as dt, 12 as val union all
  select 'n2' as name, 8  as month, convert(date, '2020-08-08', 23) as dt, 7 as val union all
  select 'n2' as name, 8  as month, convert(date, '2020-08-09', 23) as dt, 14 as val union all
  select 'n3' as name, 9  as month, convert(date, '2020-09-01', 23) as dt, 5 as val union all
  select 'n3' as name, 9  as month, convert(date, '2020-09-03', 23) as dt, 11 as val union all
  select 'n3' as name, 9  as month, convert(date, '2020-09-07', 23) as dt, 10 as val union all
  select 'n3' as name, 9  as month, convert(date, '2020-09-21', 23) as dt, 7 as val
)
, rn as (
  /*Build calculation hierarchy from 1st to last*/
  select
    name,
    month,
    dt,
    val,
    dense_rank() over(partition by name, month order by dt asc) as rn,
    /*To identify last item in group*/
    dense_rank() over(partition by name, month order by dt desc) as rn_desc
  from a
)
/*Simulate running sum with reset*/
, rec (
  name,
  month,
  running_sum,
  dt_until,
  rn,
  isleaf,
  reset_flag
) as (
  /*Start from 1st item*/
  select
    name,
    month,
    val as running_sum,
    dt,
    rn,
    case when rn_desc = 1 then 1 else 0 end as isleaf,
    case when val > 15 then 1 else 0 end as reset_flag
  from rn
  where rn = 1
  
  union all
  
  /*Calculate current value*/
  select
    rec.name,
    rec.month,
    /*
      If we need to reset calculation,
      then use original value,
      else - add value to running total
    */
    case
      when rec.running_sum > 15
      then rn.val
      else rec.running_sum + rn.val
    end as running_sum,
    dt,
    rec.rn + 1 as rn,
    case when rn.rn_desc = 1 then 1 else 0 end as isleaf,
    case
      /*Reset on threshold violation after addition*/
      when rec.running_sum <= 15
        and rec.running_sum + rn.val > 15
      then 1
      /*Or when there was reset before and current value vuolates threshold also*/
      when rn.val > 15
      then 1
      else 0
    end as reset_flag
  from rec
    join rn
      on rec.name = rn.name
        and rec.month = rn.month
        and rec.rn + 1 = rn.rn
)
select *
from rec
where isleaf = 1
  or reset_flag = 1
order by 1, 2, rn asc
GO
姓名 |月 |运行总和 | dt_直到 | rn |小岛 |重置标志 :--- | ----: | ----------: | :--------- | -: | -----: | ---------: n1 | 12 | 16 | 2020-12-05 | 2 | 0 | 1 n1 | 12 | 17 | 2020-12-09 | 3 | 0 | 1 n1 | 12 | 8 | 2020-12-14 | 4 | 1 | 0 n2 | 8 | 19 | 2020-08-08 | 2 | 0 | 1 n2 | 8 | 14 | 2020-08-09 | 3 | 1 | 0 n3 | 9 | 16 | 2020-09-03 | 2 | 0 | 1 n3 | 9 | 17 | 2020-09-21 | 4 | 1 | 1

dbfiddle here 用于 SQL Server。 而对于甲骨文here

UPD:上一个查询在此db<>fiddle

Oracle 的另一种方法是使用MODEL 子句。这里我们使用CV 函数而不是递归来重置计算。

with a as (
  select 'n1' as name, 12 as month, date '2020-12-01' as dt, 7 as val   from dual union all
  select 'n1' as name, 12 as month, date '2020-12-05' as dt, 9 as val   from dual union all
  select 'n1' as name, 12 as month, date '2020-12-09' as dt, 17 as val  from dual union all
  select 'n1' as name, 12 as month, date '2020-12-14' as dt, 8 as val   from dual union all
  select 'n2' as name, 8  as month, date '2020-08-02' as dt, 12 as val  from dual union all
  select 'n2' as name, 8  as month, date '2020-08-08' as dt, 7 as val   from dual union all
  select 'n2' as name, 8  as month, date '2020-08-09' as dt, 14 as val  from dual union all
  select 'n3' as name, 9  as month, date '2020-09-01' as dt, 5 as val   from dual union all
  select 'n3' as name, 9  as month, date '2020-09-03' as dt, 11 as val  from dual union all
  select 'n3' as name, 9  as month, date '2020-09-07' as dt, 10 as val  from dual union all
  select 'n3' as name, 9  as month, date '2020-09-21' as dt, 7 as val   from dual 
)
, rn as (
  /*Build calculation hierarchy from 1st to last*/
  select
    name,
    month,
    dt,
    val,
    0 as rsum,
    0 as keep_flag,
    dense_rank() over(partition by name, month order by dt asc) as rn
  from a
)
, rsum as (
  /*Running sum with reset*/
  select *
  from rn
  model
    /*When to break calculation*/
    partition by (name, month)
    /*Dimension to iterate with model*/
    dimension by (rn)
    /*Value, running sum and the flag where we reset calculations*/
    measures (val, rsum, keep_flag, dt)
    /*Keep null values out of calculation range to identify last row per group*/
    keep nav
    rules update
    (
      /*For all sequential numberer RNs in ascending order*/
      rsum[rn > 0] order by rn asc
        /*When we still have place till 15 (e.g previous calculation of RSUM
        is not greater than 15), we add current value of VAL to previous RSUM.
        Else we need to restart
        */
        = case
            when rsum[cv() - 1] <= 15
            then rsum[cv() - 1] + val[cv()]
            else val[cv()]
          end,
      
      /*Again, when there's a place and we are not at the end of partition,
      we mark the row as participating in another aggregated row
      */
      keep_flag[rn > 0] order by rn asc
        = case
            when rsum[cv()] <= 15 and rsum[cv() + 1] is not null
            then 0
            else 1
          end
    )
)
select
  name,
  month,
  rsum,
  dt
from rsum
/*Keep only aggregated rows or last row per group*/
where keep_flag = 1
order by name, month, rn asc
姓名 |月 | RSUM | DT :--- | ----: | ---: | :-------- n1 | 12 | 16 | 20 年 12 月 5 日 n1 | 12 | 17 | 20 年 12 月 9 日 n1 | 12 | 8 | 20 年 12 月 14 日 n2 | 8 | 19 | 20 年 8 月 8 日 n2 | 8 | 14 | 20 年 8 月 9 日 n3 | 9 | 16 | 20 年 9 月 3 日 n3 | 9 | 17 | 20 年 9 月 21 日

db小提琴here

【讨论】:

  • 我认为这个问题需要递归子查询,所以这可能是最好的解决方案。
  • 感谢@astentx。我从结果中删除这些行的原因是查询不应选择其总和(val)
  • @GordonLinoff - 显然,你想错了。我刚刚发布了一个不需要递归 anything 的解决方案。
  • @Bob 我已经更新了我的答案:删除了 Oracle 无法用于递归查询的 select ... from (select...),并为 Oracle 添加了相应的 dbfiddle 链接。现在两个 DBMS 都可以了
  • 谢谢@astentx。使用 Model 的 Oracle 版本非常快,大约 4000 万条记录只需 5 秒。我选择了另一种解决方案,它同样快速且只需几行代码。递归解决方案非常慢,即使在 20 分钟后也没有完成。非常感谢您花时间提供如此详细的回复。
【解决方案2】:

在 Oracle 12.1 及更高版本中,对于 match_recognize 子句,这项任务很容易完成。您可以轻松地在输出中添加更多列(例如显示每个组的第一个和最后一个日期等)

设置

create table this_table (name, mth, dt, val) as
  select 'n1', 12, date '2020-12-01',  7 from dual union all
  select 'n1', 12, date '2020-12-05',  9 from dual union all
  select 'n1', 12, date '2020-12-09', 17 from dual union all
  select 'n1', 12, date '2020-12-14',  8 from dual union all
  select 'n2',  8, date '2020-08-02', 12 from dual union all
  select 'n2',  8, date '2020-08-08',  7 from dual union all
  select 'n2',  8, date '2020-08-09', 14 from dual union all
  select 'n3',  9, date '2020-09-01',  5 from dual union all
  select 'n3',  9, date '2020-09-03', 11 from dual union all
  select 'n3',  9, date '2020-09-07', 10 from dual union all
  select 'n3',  9, date '2020-09-21',  7 from dual
;

查询和输出

select name, mth, sum_val
from   this_table
match_recognize(
  partition by name, mth
  order     by dt
  measures  sum(val) as sum_val
  pattern   ( a* b )
  define    a as sum(val) <= 15, b as sum(val) > 15
);

NAME   MTH SUM_VAL
----  ---- -------
n1      12      16
n1      12      17
n2       8      19
n3       9      16
n3       9      17

【讨论】:

  • 感谢@mathguy,几行代码和完美的结果。大约 4000 万条记录只用了 5 秒。
猜你喜欢
  • 2020-03-07
  • 1970-01-01
  • 2021-06-29
  • 1970-01-01
  • 2014-04-20
  • 1970-01-01
  • 2011-11-27
  • 2014-12-15
  • 1970-01-01
相关资源
最近更新 更多