【问题标题】:How can I vectorize this for loop in numpy?如何在 numpy 中对这个 for 循环进行矢量化?
【发布时间】:2016-05-26 13:42:01
【问题描述】:

代码如下:

import numpy as np
X = np.array(range(15)).reshape(5,3)  # X's element value is meaningless
flag = np.random.randn(5,4)
y = np.array([0, 1, 2, 3, 0])  # Y's element value in range(flag.shape[1]) and Y.shape[0] equals X.shape[0]
dW = np.zeros((3, 4))  # dW.shape equals (X.shape[1], flag.shape[1])
for i in xrange(5):
    for j in xrange(4):
        if flag[i,j] > 0:
            dW[:,j] += X[i,:].T
            dW[:,y[i]] -= X[i,:].T

为了更有效地计算 dW,如何对这个 for 循环进行矢量化?

【问题讨论】:

    标签: python numpy vectorization


    【解决方案1】:

    你可以这样做:

    ff = (flag > 0) * 1
    ff = ff.reshape((5, 4, 1, 1))
    XX = ff * X
    [ii, jj] = np.meshgrid(np.arange(5), np.arange(4))
    dW[:, jj] += XX[ii, jj, ii, :].transpose((2, 0, 1))
    dW[:, y[ii]] -= XX[ii, jj, ii, :].transpose((2, 0, 1))
    

    您可以进一步合并和折叠这些表达式以获得单线,但不会增加任何性能。

    更新 #1:是的,很抱歉这没有给出正确的结果,我的支票有错字

    【讨论】:

    • 既然 X 只在 [x, y, x, z] 处被索引,它不能做成 3 维的吗?
    • 我认为你的前三行拼写更好XX = np.where(flag[...,np.newaxis,np.newaxis], X, 0)
    【解决方案2】:

    我会这样做:

    # has shape (x.shape[1],) + flag.shape
    masked = np.where(flag > 0, X.T[...,np.newaxis], 0)
    
    # sum over the i index
    dW = masked.sum(axis=1)
    
    # sum over the j index
    np.subtract.at(dW, np.s_[:,y], masked.sum(axis=2))
    
    # dW[:,y] -= masked.sum(axis=2) does not work here
    

    请参阅ufunc.at 的文档以了解对最后一条评论的解释

    【讨论】:

    • 如果y 索引中有重复值,-= 可能会出现缓冲问题。这就是ufunc .at 可以提供帮助的地方。
    • 所以第一行masked = np.where(flag > 0, X.T[...,np.newaxis], 0)使用广播,厉害!那么矢量化的一种方法是增加维度?
    【解决方案3】:

    这是基于np.add.reduceat 的矢量化方法-

    # --------------------- Setup output array ----------------------------------
    dWOut = np.zeros((X.shape[1], flag.shape[1]))
    
    # ------ STAGE #1 : Vectorize calculations for "dW[:,j] += X[i,:].T" --------
    # Get indices where flag's transposed version has > 0
    idx1 = np.argwhere(flag.T > 0)
    
    # Row-extended version of X using idx1's col2 that corresponds to i-iterator
    X_ext1 = X[idx1[:,1]]
    
    # Get the indices at which we need to columns change
    shift_idx1 = np.append(0,np.where(np.diff(idx1[:,0])>0)[0]+1)
    
    # Use the changing indices as boundaries for add.reduceat to add 
    # groups of rows from extended version of X
    dWOut[:,np.unique(idx1[:,0])] += np.add.reduceat(X_ext1,shift_idx1,axis=0).T
    
    # ------ STAGE #2 : Vectorize calculations for "dW[:,y[i]] -= X[i,:].T" -------
    # Repeat same philsophy for this second stage, except we need to index into y.
    # So, that would involve sorting and also the iterator involved is just "i".
    idx2 = idx1[idx1[:,1].argsort()]
    cols_idx1 = y[idx2[:,1]]
    X_ext2 = X[idx2[:,1]]
    sort_idx = (y[idx2[:,1]]).argsort()
    X_ext2 = X_ext2[sort_idx]
    shift_idx2 = np.append(0,np.where(np.diff(cols_idx1[sort_idx])>0)[0]+1)
    dWOut[:,np.unique(cols_idx1)] -= np.add.reduceat(X_ext2,shift_idx2,axis=0).T
    

    【讨论】:

    • 谢谢。虽然有几行代码,但更容易理解。
    猜你喜欢
    • 1970-01-01
    • 2012-10-09
    • 2015-06-25
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-09-04
    • 1970-01-01
    相关资源
    最近更新 更多