向量化填充动态编程表的嵌套 for 循环答案

【问题标题】：Vectorizing a nested for loop which fills a dynamic programming table向量化填充动态编程表的嵌套 for 循环
【发布时间】：2014-11-22 02:14:49
【问题描述】：

我想知道是否有一种方法可以对这个函数中的嵌套 for 循环进行矢量化，它正在填充 2D 动态编程表 DP 的条目。我相信至少内部循环可以被矢量化，因为每一行只取决于前一行。我不知道该怎么做。请注意，此函数是在大型二维数组（图像）上调用的，因此嵌套的 for 循环确实不会削减它。

function [cols] = compute_seam(energy)
    [r, c, ~] = size(energy);

    cols = zeros(r);

    DP = padarray(energy, [0, 1], Inf);    
    BP = zeros(r, c);

    for i = 2 : r        
        for j = 1 : c
            [x, l] = min([DP(i - 1, j), DP(i - 1, j + 1), DP(i - 1, j + 2)]);
            DP(i, j + 1) = DP(i, j + 1) + x;
            BP(i, j) = j + (l - 2);
        end
    end

    [~, j] = min(DP(r, :));
    j = j - 1;

    for i = r : -1 : 1
        cols(i) = j;
        j = BP(i, j);
    end
end

【问题讨论】：

标签： matlab image-processing optimization vectorization

【解决方案1】：

最内层嵌套循环的向量化

您的假设是正确的，至少内部循环是可向量化的。这是嵌套循环部分的修改代码 -

rows_DP = size(DP,1); %// rows in DP

%// Get first row linear indices for a group of neighboring three columns, 
%// which would be incremented as we move between rows with the row iterator
start_ind1 = bsxfun(@plus,[1:rows_DP:2*rows_DP+1]',[0:c-1]*rows_DP); %//'
for i = 2 : r
    ind1 = start_ind1 + i-2; %// setup linear indices for the row of this iteration
    [x,l] = min(DP(ind1),[],1); %// get x and l values in one go
    DP(i,2:c+1) = DP(i,2:c+1) + x; %// set DP values of a row in one go
    BP(i,1:c) = [1:c] + l-2; %// set BP values of a row in one go
end

基准测试

基准代码 -

N = 3000; %// Datasize
energy = rand(N);
[r, c, ~] = size(energy);

disp('------------------------------------- With Original Code')
DP = padarray(energy, [0, 1], Inf);
BP = zeros(r, c);
tic
for i = 2 : r
    for j = 1 : c
        [x, l] = min([DP(i - 1, j), DP(i - 1, j + 1), DP(i - 1, j + 2)]);
        DP(i, j + 1) = DP(i, j + 1) + x;
        BP(i, j) = j + (l - 2);
    end
end
toc,clear DP BP x l

disp('------------------------------------- With Vectorized Code')
DP = padarray(energy, [0, 1], Inf);
BP = zeros(r, c);
tic
rows_DP = size(DP,1); %// rows in DP
start_ind1 = bsxfun(@plus,[1:rows_DP:2*rows_DP+1]',[0:c-1]*rows_DP); %//'
for i = 2 : r
    ind1 = start_ind1 + i-2; %// setup linear indices for the row of this iteration
    [x,l] = min(DP(ind1),[],1); %// get x and l values in one go
    DP(i,2:c+1) = DP(i,2:c+1) + x; %// set DP values of a row in one go
    BP(i,1:c) = [1:c] + l-2; %// set BP values of a row in one go
end
toc

结果 -

------------------------------------- With Original Code
Elapsed time is 44.200746 seconds.
------------------------------------- With Vectorized Code
Elapsed time is 1.694288 seconds.

因此，您可能会享受到 26x speedup 性能的良好改进，只需进行一点矢量化调整。

最内层嵌套循环的向量化

基准测试

更多调整