SQL中的有界累积和答案

【问题标题】：Bounded cumulative sum in SQLSQL中的有界累积和
【发布时间】：2019-10-02 19:12:07
【问题描述】：

如何使用 SQL 计算列的累积和，以便累积和始终保持在上限/下限内。下限 -2 和上限 10 的示例，显示常规累积和和有界累积和。

id      input 
-------------
 1       5   
 2       7   
 3     -10   
 4     -10   
 5       5   
 6      10

结果：

id    cum_sum    bounded_cum_sum  
---------------------------------
  1       5          5     
  2      12         10    
  3       2          0
  4      -8         -2
  5      -3          3     
  6       7         10

有关有限累积和的一些（非 SQL）示例，请参阅 https://codegolf.stackexchange.com/questions/61684/calculate-the-bounded-cumulative-sum-of-a-vector。

【问题讨论】：

stackoverflow.com/questions/2120544/how-to-get-cumulative-sum/…
你的意思是上限/下限？请解释逻辑
见给出的例子。 cum_sum 列是正常的累积总和（如果需要，也可以是运行总计）。 bounded_cum_sum 是我所追求的。例如。在第二行中，累积和等于 12，但由于该值高于上限，因此 bounded_cum_sum 的值等于其上限，即 10。
这与stackoverflow.com/questions/2120544/how-to-get-cumulative-sum 不同，因为该问题考虑了“常规”累积总和。
@JuanCarlosOropeza：我添加了一个指向另一个 Stack Exchange 问题的链接，其中包含关于有界累积和和上限/下限（与 SQL 无关）的更多解释

标签： sql-server cumulative-sum

【解决方案1】：

您可以（几乎）始终使用cursor 来实现您拥有的任何累积逻辑。该技术非常常规，因此一旦掌握，就可以轻松解决各种问题。

需要特别注意的一点：这里我就地更新了表，所以[id] 列必须是唯一索引的。

（在 SQL Server 2017 最新的 linux docker 镜像上测试）

测试数据集

use [testdb];
if OBJECT_ID('testdb..test') is not null
    drop table testdb..test;

create table test (
    [id] int,
    [input] int,
);

insert into test (id, input)
values (1,5), (2,7), (3,-10), (4,-10), (5,5), (6,10);

解决方案

/* A generic row-by-row cursor solution */

-- First of all, make [id] uniquely indexed to enable "where current of" 
create unique index idx_id on test(id);

-- append answer columns
alter table test 
    add [cum_sum] int,
        [bounded_cum_sum] int;

-- storage for each row
declare @id int,
        @input int,
        @cum_sum int, 
        @bounded_cum_sum int;
-- record accumulated values
declare @prev_cum_sum int = 0,
        @prev_bounded_cum_sum int = 0;

-- open a cursor ordered by [id] and updatable for assigned columns
declare cur CURSOR local
for select [id], [input], [cum_sum], [bounded_cum_sum]
    from test
    order by id
for update of [cum_sum], [bounded_cum_sum];
open cur;

while 1=1 BEGIN

    /* fetch next row and check termination condition */
    fetch next from cur 
        into @id, @input, @cum_sum, @bounded_cum_sum;

    if @@FETCH_STATUS <> 0
        break;

    /* program body */

    -- main logic
    set @cum_sum = @prev_cum_sum + @input;
    set @bounded_cum_sum = @prev_bounded_cum_sum + @input;
    if @bounded_cum_sum > 10 set @bounded_cum_sum=10
    else if @bounded_cum_sum < -2 set @bounded_cum_sum=-2;

    -- write the result back
    update test 
        set [cum_sum] = @cum_sum,
            [bounded_cum_sum] = @bounded_cum_sum
        where current of cur;

    -- setup for next row
    set @prev_cum_sum = @cum_sum;
    set @prev_bounded_cum_sum = @bounded_cum_sum;
END

-- cleanup
close cur;
deallocate cur;

-- show
select * from test;

结果

|   | id | input | cum_sum | bounded_cum_sum |
|---|----|-------|---------|-----------------|
| 1 | 1  | 5     | 5       | 5               |
| 2 | 2  | 7     | 12      | 10              |
| 3 | 3  | -10   | 2       | 0               |
| 4 | 4  | -10   | -8      | -2              |
| 5 | 5  | 5     | -3      | 3               |
| 6 | 6  | 10    | 7       | 10              |

【讨论】：

是的，大多数人宁愿在她/他自己的工作站上使用通用语言来执行此操作。但是，如果您的公司过于慷慨而无法提供对 pandas DataFrame 的访问，而您又太胆小不敢换工作，那么生活很可能会找到自己的出路——在 SQL Server 上.....
你会惊讶于在 pandas DataFrame 中大规模地这样做是多么烦人 ;-)