【问题标题】:Cumulative sum over index in MATLABMATLAB中索引的累积和
【发布时间】:2014-11-20 14:25:25
【问题描述】:

考虑以下矩阵,其中第一列是索引,第二列 - 是值,第三列 - 是索引更改后重置的累积和:

1     1     1     % 1   
1     2     3     % 1+2
1     3     6     % 3+3
2     4     4     % 4
2     5     9     % 4+5
3     6     6     % 6
3     7    13    % 6+7
3     8    21    % 13+8
3     9    30    % 21+9
4    10    10    % 10
4    11    21    % 10+11

如何获得第三列避免循环?

我尝试以下方法:

  A = [1 1;...                 % Input
       1 2;...
       1 3;...
       2 4;...
       2 5;...
       3 6;...
       3 7;...
       3 8;...
       3 9;...
       4 10;...
       4 11];
  CS = cumsum(A(:,2));         % cumulative sum over the second column

  I = [diff(data(:,1));0];     % indicate the row before the index (the first column)  
                               % changes
  offset=CS.*I;                % extract the last value of cumulative sum for a given 
                               % index

  offset(end)=[]; offset=[0; offset] %roll offset 1 step forward

  [A, CS, offset]

结果是:

ans =

 1     1     1     0
 1     2     3     0
 1     3     6     0
 2     4    10     6
 2     5    15     0
 3     6    21    15
 3     7    28     0
 3     8    36     0
 3     9    45     0
 4    10    55    45
 4    11    66     0

如果有一种简单的方法可以将上面矩阵的第四列转换为

O =

 0
 0
 0
 6
 6
15
15
15
15
45
45

因为 CS-O 给出了想要的输出。

如果有任何建议,我将不胜感激。

【问题讨论】:

  • 有趣的问题,它显示了努力:+1

标签: matlab cumulative-sum


【解决方案1】:

您的策略实际上可能是我所做的。您的最后一步可以通过这种方式实现:(但请记住,您的方法假设连续索引。您当然可以通过 offset=[0; CS(1:end-1).*(diff(A(:,1))~=0)]; 更改此设置,但仍需要排序索引。)

I = find(offset);
idxLastI = cumsum(offset~=0);
hasLastI = idxLastI~=0; %// For the zeros at the beginning
%// Combine the above to the output
O = zeros(size(offset));
O(hasLastI) = offset(I(idxLastI(hasLastI)));
out = CS-O;

这应该类似于 Divakarcumsum-diff 方法。

【讨论】:

    【解决方案2】:

    accumarray 与自定义函数一起使用:

    result = accumarray(A(:,1), A(:,2), [], @(x) {cumsum(x)});
    result = vertcat(result{:});
    

    无论索引更改是否以 1 为步长(如您的示例),这都有效。


    以下方法更快,因为它避免了单元格。在his answer 中查看@Divakar 的出色基准测试(并查看他的解决方案,这是最快的):

    1. 如果索引更改始终对应于增加 1(如您的示例中所示):

      last = find(diff(A(:,1)))+1; %// index of last occurrence of each index value
      result = A(:,2); %// this will be cumsum'd, after correcting for partial sums
      correction = accumarray(A(:,1), A(:,2)); %// correction to be applied for cumsum
      result(last) = result(last)-correction(1:end-1); %// apply correction
      result = cumsum(result); %// compute result
      
    2. 如果索引值的变化幅度超过 1(即可能存在“跳过”值):这需要稍作修改以稍微减慢速度。

      last = find(diff(A(:,1)))+1; %// index of last occurrence of each index value
      result = A(:,2); %// this will be cumsum'd, after correcting for partial sums
      correction = accumarray(A(:,1), A(:,2), [], @sum, NaN); %// correction
      correction = correction(~isnan(correction)); %// remove unused values
      result(last) = result(last)-correction(1:end-1); %// apply correction
      result = cumsum(result);
      

    【讨论】:

    • 我试过accumarray,但后来不知道我应该在它周围使用花括号。 +1
    • 啊,我明白了,使用带有匿名函数的单元格数组?肯定很整洁!
    • @Divakar 是的,accumarray 使用的这个符号有点奇怪。人们会期待像(...'uniformoutput','0)
    • 我要问你一件事,因为我没有那么多地使用匿名函数,我想你会知道更多。这些函数可以在很多地方使用,比如accumarraybsxfun,但我的猜测/直觉/直觉认为 MATLAB 在使用匿名函数时不是以性能为导向的。你怎么看待这件事?或者你认为它只取决于那个特定的安。 Func 不能做出这样的概括性陈述吗?
    • @LuisMendo 你说的都说得通!感谢您提出您的想法!同样是的,我记得在某个地方也看到isempty@isempty 快。无论如何,MATLAB 函数调用都很昂贵。
    【解决方案3】:

    基于cumsumdiff 的方法可能对性能有好处 -

    %// cumsum values for the entire column-2
    cumsum_vals = cumsum(A(:,2));
    
    %// diff for column-1
    diffA1 = diff(A(:,1));
    
    %// Cumsum after each index
    cumsum_after_each_idx = cumsum_vals([diffA1 ;0]~=0);
    
    %// Get cumsum for each "group" and place each of its elements at the right place
    %// to be subtracted from cumsum_vals for getting the final output
    diffA1(diffA1~=0) = [cumsum_after_each_idx(1) ; diff(cumsum_after_each_idx)];
    
    out = cumsum_vals-[0;cumsum(diffA1)];
    

    基准测试

    如果您关心性能,这里有一些基于accumarray 的其他解决方案的基准测试。

    基准代码(为了紧凑而删除了 cmets)-

    A = ..  Same as in the question
    
    num_runs = 100000; %// number of runs
    
    disp('---------------------- With cumsum and diff')
    tic
    for k1=1:num_runs
        cumsum_vals = cumsum(A(:,2));
        diffA1 = diff(A(:,1));
        cumsum_after_each_idx = cumsum_vals([diffA1 ;0]~=0);
        diffA1(diffA1~=0) = [cumsum_after_each_idx(1) ; diff(cumsum_after_each_idx)];
        out = cumsum_vals-[0;cumsum(diffA1)];
    end
    toc,clear cumsum_vals  diffA1 cumsum_after_each_idx out
    
    disp('---------------------- With accumarray - version 1')
    tic
    for k1=1:num_runs
        result = accumarray(A(:,1), A(:,2), [], @(x) {cumsum(x)});
        result = vertcat(result{:});
    end
    toc, clear result
    
    disp('--- With accumarray - version 2 (assuming consecutive indices only)')
    tic
    for k1=1:num_runs
        last = find(diff(A(:,1)))+1; %// index of last occurrence of each index value
        result = A(:,2); %// this will be cumsum'd, after correcting for partial sums
        correction = accumarray(A(:,1), A(:,2)); %// correction to be applied for cumsum
        result(last) = result(last)-correction(1:end-1); %// apply correction
        result = cumsum(result); %// compute result
    end
    toc, clear last result correction
    
    disp('--- With accumarray - version 2 ( general case)')
    tic
    for k1=1:num_runs
        last = find(diff(A(:,1)))+1; %// index of last occurrence of each index value
        result = A(:,2); %// this will be cumsum'd, after correcting for partial sums
        correction = accumarray(A(:,1), A(:,2), [], @sum, NaN); %// correction
        correction = correction(~isnan(correction)); %// remove unused values
        result(last) = result(last)-correction(1:end-1); %// apply correction
        result = cumsum(result);
    end
    toc
    

    结果 -

    ---------------------- With cumsum and diff
    Elapsed time is 1.688460 seconds.
    ---------------------- With accumarray - version 1
    Elapsed time is 28.630823 seconds.
    --- With accumarray - version 2 (assuming consecutive indices only)
    Elapsed time is 2.416905 seconds.
    --- With accumarray - version 2 ( general case)
    Elapsed time is 4.839310 seconds.
    

    【讨论】:

    • 我添加了一个不同的基于accumarray 的方法,没有单元格。你能把它包括在你的测试中吗?
    • @LuisMendo 与前一个相比有了很大的改进!再次证明细胞对性能没有好处!
    • 根据您的良好观察,我已将方法 2 一分为二。对不起,一团糟!
    • @LuisMendo 没关系。如您所见,仍然是一个相当不错的改进!
    • 再次感谢,很抱歉让您在基准测试方面做更多工作!
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-07-20
    • 2011-04-06
    • 2014-12-09
    • 1970-01-01
    • 1970-01-01
    • 2011-03-07
    相关资源
    最近更新 更多