向量化 numpy 索引并应用函数来构建矩阵答案

【问题标题】：Vectorize numpy indexing and apply a function to build a matrix向量化 numpy 索引并应用函数来构建矩阵
【发布时间】：2017-06-07 15:26:57
【问题描述】：

我有一个大小为 (d,N) 的矩阵 X。换句话说，有 N 个向量，每个向量都有 d 维。例如，

X = [[1,2,3,4],[5,6,7,8]]

有 N=4 个 d=2 维的向量。

另外，我有 rag 数组（列表列表）。索引是 X 矩阵中的索引列。例如，

I = [ [0,1], [1,2,3] ]

I[0]=[0,1] 索引矩阵 X 中的第 0 列和第 1 列。类似地，元素 I[1] 索引第 1,2 和 3 列。请注意，I 的元素是不属于一样长！

我想做的是使用 I 中的每个元素对矩阵 X 中的列进行索引，对向量求和并得到一个向量。对 I 的每个元素重复此操作，从而构建一个新矩阵 Y。矩阵 Y 应具有与 I 数组中的元素一样多的 d 维向量。在我的示例中，Y 矩阵将有 2 个二维向量。

在我的示例中，元素 I[0] 告诉从矩阵 X 中获取第 0 列和第 1 列。将矩阵 X 的两个向量二维向量相加，并将此向量放入 Y（第 0 列）。然后，元素 I[1] 告诉对矩阵 X 的第 1,2 和 3 列求和，并将这个新向量放入 Y（第 1 列）。

我可以使用循环轻松完成此操作，但如果可能的话，我想将此操作矢量化。我的矩阵 X 有数十万列，而 I 索引矩阵有数万个元素（每个元素都是一个简短的索引列表）。

我的循环代码：

Y = np.zeros( (d,len(I)) )
for i,idx in enumerate(I):
    Y[:,i] = np.sum( X[:,idx], axis=1 )

【问题讨论】：

如果你实现了分享你的循环代码？
@Divakar 添加了我的循环代码

标签： python performance numpy vectorization

【解决方案1】：

这是一种方法-

# Get a flattened version of indices
idx0 = np.concatenate(I)

# Get indices at which we need to do "intervaled-summation" along axis=1
cut_idx = np.append(0,map(len,I))[:-1].cumsum()

# Finally index into cols of array with flattend indices & perform summation
out = np.add.reduceat(X[:,idx0], cut_idx,axis=1)

分步运行-

In [67]: X
Out[67]: 
array([[ 1,  2,  3,  4],
       [15,  6, 17,  8]])

In [68]: I
Out[68]: array([[0, 2, 3, 1], [2, 3, 1], [2, 3]], dtype=object)

In [69]: idx0 = np.concatenate(I)

In [70]: idx0 # Flattened indices
Out[70]: array([0, 2, 3, 1, 2, 3, 1, 2, 3])

In [71]: cut_idx = np.append(0,map(len,I))[:-1].cumsum()

In [72]: cut_idx # We need to do addition in intervals limited by these indices
Out[72]: array([0, 4, 7])

In [74]: X[:,idx0]  # Select all of the indexed columns
Out[74]: 
array([[ 1,  3,  4,  2,  3,  4,  2,  3,  4],
       [15, 17,  8,  6, 17,  8,  6, 17,  8]])

In [75]: np.add.reduceat(X[:,idx0], cut_idx,axis=1)
Out[75]: 
array([[10,  9,  7],
       [46, 31, 25]])

【讨论】：

谢谢！您介意简要解释一下每行的作用（在我查找函数之前）吗？
@Divakar，如果我只想返回最后一步中的值而不是求和，那么函数将是什么而不是 np.add.reduceat。因此，对于 OP 的 X 和 I 示例，我想要输出：[[1, 2], [6, 7, 8]]