最内层嵌套循环的向量化
您的假设是正确的,至少内部循环是可向量化的。这是嵌套循环部分的修改代码 -
rows_DP = size(DP,1); %// rows in DP
%// Get first row linear indices for a group of neighboring three columns,
%// which would be incremented as we move between rows with the row iterator
start_ind1 = bsxfun(@plus,[1:rows_DP:2*rows_DP+1]',[0:c-1]*rows_DP); %//'
for i = 2 : r
ind1 = start_ind1 + i-2; %// setup linear indices for the row of this iteration
[x,l] = min(DP(ind1),[],1); %// get x and l values in one go
DP(i,2:c+1) = DP(i,2:c+1) + x; %// set DP values of a row in one go
BP(i,1:c) = [1:c] + l-2; %// set BP values of a row in one go
end
基准测试
基准代码 -
N = 3000; %// Datasize
energy = rand(N);
[r, c, ~] = size(energy);
disp('------------------------------------- With Original Code')
DP = padarray(energy, [0, 1], Inf);
BP = zeros(r, c);
tic
for i = 2 : r
for j = 1 : c
[x, l] = min([DP(i - 1, j), DP(i - 1, j + 1), DP(i - 1, j + 2)]);
DP(i, j + 1) = DP(i, j + 1) + x;
BP(i, j) = j + (l - 2);
end
end
toc,clear DP BP x l
disp('------------------------------------- With Vectorized Code')
DP = padarray(energy, [0, 1], Inf);
BP = zeros(r, c);
tic
rows_DP = size(DP,1); %// rows in DP
start_ind1 = bsxfun(@plus,[1:rows_DP:2*rows_DP+1]',[0:c-1]*rows_DP); %//'
for i = 2 : r
ind1 = start_ind1 + i-2; %// setup linear indices for the row of this iteration
[x,l] = min(DP(ind1),[],1); %// get x and l values in one go
DP(i,2:c+1) = DP(i,2:c+1) + x; %// set DP values of a row in one go
BP(i,1:c) = [1:c] + l-2; %// set BP values of a row in one go
end
toc
结果 -
------------------------------------- With Original Code
Elapsed time is 44.200746 seconds.
------------------------------------- With Vectorized Code
Elapsed time is 1.694288 seconds.
因此,您可能会享受到 26x speedup 性能的良好改进,只需进行一点矢量化调整。
更多调整
可以在代码中尝试更多优化调整以提高性能 -
cols = zeros(r) 可以替换为col(r,r) = 0。
DP = padarray(energy, [0, 1], Inf) 可以替换为
DP(1:size(energy,1),1:size(energy,2)+2)=Inf;
DP(:,2:end-1) = energy;
BP = zeros(r, c) 可以替换为BP(r, c) = 0。
此处使用的预分配调整灵感来自 this blog post。