【问题标题】:Vectorize MATLAB for loop向量化 MATLAB for 循环
【发布时间】:2015-06-01 22:53:26
【问题描述】:

我有以下几行代码

y = zeros(n, 1);

for i=1:n 
    b = L * [u(i:-1:max(1,i-M+1));zeros((-i+M)*(i-M<0),1)];
    y(i) = b' * gamma;
end

u 是 nx1,gamma 是 Mx1,L 是 MxM

n 的值非常大,那么对于如何向量化 for 循环有什么想法吗?

【问题讨论】:

  • 请提供更多详细信息,u 的尺寸是多少? gamma 是标量吗?
  • 'n' 有多大,在你的情况下'M' 有多大?
  • n 是 4000000,M 是 1000
  • 那么当尝试展开循环时,内存消耗可能是一个问题...... for 循环需要多长时间?
  • 我编写了一个基于矩阵乘法的版本,没有任何循环,但是由于您的大 n 的内存消耗,这根本没有帮助。您可以尝试parfor 或将计算转移到 GPU 上

标签: matlab vectorization


【解决方案1】:

讨论及解答代码

初步方法

基于矩阵乘法的方法 -

u_pad = [zeros(M-1,1) ; u];         %//   Pad u with zeros
idx = bsxfun(@plus,[M:-1:1]',0:n-1);%//'# Calculate sliding indices for u
u_pad_indexed = u_pad(idx);         %//   Index into padded u 
y_vectzed = gamma.'*L*u_pad_indexed;%//'# Matrix-multiplications for final o/p

修改方法

现在,您可以处理大量数据。因此,为了针对这种情况进行优化,可以将数据分解为可运行的较小部分,并且可以迭代地完成操作。然后,每次迭代都会计算输出数组的一部分。

使用这种新策略,初始设置可以完成一次,并在每次迭代中重复使用。修改后的方法看起来像这样 -

div_factor = [...]   %// Make sure it is a divisor of n
nrows = n/div_factor;
start_idx =  bsxfun(@plus,[M:-1:1]',0:nrows-1);  %//'
u_pad = [zeros(M-1,1) ; u];

y_vectorized = zeros(div_factor,n/div_factor);
for iter = 1:div_factor
    u_pad_indexed = u_pad((iter-1)*nrows + start_idx);
    y_vectorized(iter,:) = gamma.'*L*u_pad_indexed;  %//'
end
y_vectorized = reshape(y_vectorized.',[],1);

基准测试

%// Size parameters and input arrays
n = 4000000;
M = 1000;
u = rand(n,1);
gamma = rand(M,1);
L = rand(M,M);

%// Warm up tic/toc.
for k = 1:50000
    tic(); elapsed = toc();
end

disp('----------- With Original loopy code');
tic
y = zeros(n, 1);
for i=1:n
    b = L * [u(i:-1:max(1,i-M+1));zeros((-i+M)*(i-M<0),1)];
    y(i) = b' * gamma;  %//'
end
toc
clear b y

disp('----------- With Proposed solution code');
tic
..... Proposed Modified Code with div_factor = 200
toc

运行时

----------- With Original loopy code
Elapsed time is 498.563049 seconds.
----------- With Proposed solution code
Elapsed time is 44.273299 seconds.

【讨论】:

    【解决方案2】:

    编辑:我正在重做解决方案,因为我发现 Matlab 不能很好地处理匿名函数。所以我将调用从匿名函数更改为普通函数。进行此更改:

    测试 1

    Comparison(40E3, 3E3)
    Elapsed time is 21.731176 seconds.
    Elapsed time is 251.327347 seconds.
    |y2-y1| = 3.1519e-06
    

    测试 2

    Comparison(40E3, 1E3)
    Elapsed time is 6.407259 seconds.
    Elapsed time is 25.551116 seconds.
    |y2-y1| = 2.8402e-07
    

    测试 3

    Comparison(20E3, 3E3)
    Elapsed time is 10.484422 seconds.
    Elapsed time is 125.033313 seconds.
    |y2-y1| = 1.5462e-06
    

    测试 4

    Comparison(20E3, 1E3)
    Elapsed time is 3.153404 seconds.
    Elapsed time is 13.200649 seconds.
    |y2-y1| = 1.5627e-07
    

    函数是:

    function Comparison(n, M)
        u = rand(n, 1);
        L = rand(M);
        gamma = rand(M, 1);
    
        tic
            y1 = vectorized(u, L, gamma);
        toc
    
        tic
            y2 = looped(u, L, gamma);
        toc
    
        disp(['|y2-y1| = ', num2str(norm(y2 - y1, 1))])
    end
    
    function y = vectorized(u, l, gamma)
    global a Column
        M = length(gamma);
        Column = l' * gamma;
    
        x = bsxfun(@plus, -(1:M)', (1:length(u)) + 1);
        x(x < 1) = 1;
        a = u(x);
        a(1:M, 1:M) = a(1:M, 1:M) .* triu(ones(M));
        a = a';
        m = 1:size(a,1);
        y = arrayfun(@VectorY , m)';
    end
    
    function yi = VectorY(j)
    global a Column
        yi = a(j,:) * Column;
    end
    
    function y = looped(U, l, gamma)
        n = length(U);
        M = length(gamma);
    
        u = U';
        L = l';
        y = zeros(n, 1);
    
        for i=1:n
            y(i) = [u(i:-1:max(1,i-M+1)), zeros(1,(-i+M)*(i-M<0))] * L * gamma;
        end
    end
    

    【讨论】:

    猜你喜欢
    • 2013-12-22
    • 2014-05-16
    • 1970-01-01
    • 1970-01-01
    • 2012-10-24
    • 2013-12-06
    • 2023-03-25
    • 2015-07-06
    • 2013-02-18
    相关资源
    最近更新 更多