【问题标题】:Vectorizing a nested for loop which fills a dynamic programming table向量化填充动态编程表的嵌套 for 循环
【发布时间】:2014-11-22 02:14:49
【问题描述】:

我想知道是否有一种方法可以对这个函数中的嵌套 for 循环进行矢量化,它正在填充 2D 动态编程表 DP 的条目。我相信至少内部循环可以被矢量化,因为每一行只取决于前一行。我不知道该怎么做。请注意,此函数是在大型二维数组(图像)上调用的,因此嵌套的 for 循环确实不会削减它。

function [cols] = compute_seam(energy)
    [r, c, ~] = size(energy);

    cols = zeros(r);

    DP = padarray(energy, [0, 1], Inf);    
    BP = zeros(r, c);

    for i = 2 : r        
        for j = 1 : c
            [x, l] = min([DP(i - 1, j), DP(i - 1, j + 1), DP(i - 1, j + 2)]);
            DP(i, j + 1) = DP(i, j + 1) + x;
            BP(i, j) = j + (l - 2);
        end
    end

    [~, j] = min(DP(r, :));
    j = j - 1;

    for i = r : -1 : 1
        cols(i) = j;
        j = BP(i, j);
    end
end

【问题讨论】:

    标签: matlab image-processing optimization vectorization


    【解决方案1】:

    最内层嵌套循环的向量化

    您的假设是正确的,至少内部循环是可向量化的。这是嵌套循环部分的修改代码 -

    rows_DP = size(DP,1); %// rows in DP
    
    %// Get first row linear indices for a group of neighboring three columns, 
    %// which would be incremented as we move between rows with the row iterator
    start_ind1 = bsxfun(@plus,[1:rows_DP:2*rows_DP+1]',[0:c-1]*rows_DP); %//'
    for i = 2 : r
        ind1 = start_ind1 + i-2; %// setup linear indices for the row of this iteration
        [x,l] = min(DP(ind1),[],1); %// get x and l values in one go
        DP(i,2:c+1) = DP(i,2:c+1) + x; %// set DP values of a row in one go
        BP(i,1:c) = [1:c] + l-2; %// set BP values of a row in one go
    end
    

    基准测试

    基准代码 -

    N = 3000; %// Datasize
    energy = rand(N);
    [r, c, ~] = size(energy);
    
    disp('------------------------------------- With Original Code')
    DP = padarray(energy, [0, 1], Inf);
    BP = zeros(r, c);
    tic
    for i = 2 : r
        for j = 1 : c
            [x, l] = min([DP(i - 1, j), DP(i - 1, j + 1), DP(i - 1, j + 2)]);
            DP(i, j + 1) = DP(i, j + 1) + x;
            BP(i, j) = j + (l - 2);
        end
    end
    toc,clear DP BP x l
    
    disp('------------------------------------- With Vectorized Code')
    DP = padarray(energy, [0, 1], Inf);
    BP = zeros(r, c);
    tic
    rows_DP = size(DP,1); %// rows in DP
    start_ind1 = bsxfun(@plus,[1:rows_DP:2*rows_DP+1]',[0:c-1]*rows_DP); %//'
    for i = 2 : r
        ind1 = start_ind1 + i-2; %// setup linear indices for the row of this iteration
        [x,l] = min(DP(ind1),[],1); %// get x and l values in one go
        DP(i,2:c+1) = DP(i,2:c+1) + x; %// set DP values of a row in one go
        BP(i,1:c) = [1:c] + l-2; %// set BP values of a row in one go
    end
    toc
    

    结果 -

    ------------------------------------- With Original Code
    Elapsed time is 44.200746 seconds.
    ------------------------------------- With Vectorized Code
    Elapsed time is 1.694288 seconds.
    

    因此,您可能会享受到 26x speedup 性能的良好改进,只需进行一点矢量化调整。


    更多调整

    可以在代码中尝试更多优化调整以提高性能 -

    • cols = zeros(r) 可以替换为col(r,r) = 0

    • DP = padarray(energy, [0, 1], Inf) 可以替换为 DP(1:size(energy,1),1:size(energy,2)+2)=Inf; DP(:,2:end-1) = energy;

    • BP = zeros(r, c) 可以替换为BP(r, c) = 0

    此处使用的预分配调整灵感来自 this blog post

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2013-03-03
      • 2021-07-29
      • 2020-01-29
      • 1970-01-01
      • 2018-09-14
      • 2019-07-22
      相关资源
      最近更新 更多