映射具有重复索引的数组？答案

【问题标题】：Map arrays with duplicate indexes?映射具有重复索引的数组？
【发布时间】：2012-03-06 03:09:28
【问题描述】：

假设numpy中有三个数组：

a = np.zeros(5)
b = np.array([3,3,3,0,0])
c = np.array([1,5,10,50,100])

b 现在可以用作 a 和 c 的索引。例如：

   In [142]: c[b]
   Out[142]: array([50, 50, 50,  1,  1])

有没有办法通过这种切片将连接到重复索引的值相加？与

a[b] = c

只存储最后的值：

 array([ 100.,    0.,    0.,   10.,    0.])

我想要这样的东西：

a[b] += c

这会给

 array([ 150.,    0.,    0.,   16.,    0.])

我正在将非常大的向量映射到 2D 矩阵上，并且真的很想避免循环...

【问题讨论】：

标签： python matrix numpy scipy slice

【解决方案1】：

用于 NumPy 数组的 += 运算符根本无法按照您希望的方式工作，而且我不知道如何让它以这种方式工作。作为一种解决方法，我建议使用numpy.bincount():

>>> numpy.bincount(b, c)
array([ 150.,    0.,    0.,   16.])

只需根据需要附加零。

【讨论】：

感谢您的回答！我现在知道 bincount 存在——它将对其他实现非常有用。是否也可以将这种方法用于二维数组？我的现实世界问题由三个 10^7 元素向量（x-pos、y-pos、值）组成，我将它们映射到二维数组。
@brorfred：您可以将数组重新解释为一维数组，而无需使用其reshape() 方法进行复制，然后应用bincount()。

【解决方案2】：

你可以这样做：

def sum_unique(label, weight):
    order = np.lexsort(label.T)
    label = label[order]
    weight = weight[order]
    unique = np.ones(len(label), 'bool')
    unique[:-1] = (label[1:] != label[:-1]).any(-1)
    totals = weight.cumsum()
    totals = totals[unique]
    totals[1:] = totals[1:] - totals[:-1]
    return label[unique], totals

并像这样使用它：

In [110]: coord = np.random.randint(0, 3, (10, 2))

In [111]: coord
Out[111]: 
array([[0, 2],
       [0, 2],
       [2, 1],
       [1, 2],
       [1, 0],
       [0, 2],
       [0, 0],
       [2, 1],
       [1, 2],
       [1, 2]])

In [112]: weights = np.ones(10)

In [113]: uniq_coord, sums = sum_unique(coord, weights)

In [114]: uniq_coord
Out[114]: 
array([[0, 0],
       [1, 0],
       [2, 1],
       [0, 2],
       [1, 2]])

In [115]: sums
Out[115]: array([ 1.,  1.,  2.,  3.,  3.])

In [116]: a = np.zeros((3,3))

In [117]: x, y = uniq_coord.T

In [118]: a[x, y] = sums

In [119]: a
Out[119]: 
array([[ 1.,  0.,  3.],
       [ 1.,  0.,  3.],
       [ 0.,  2.,  0.]])

我只是想到了这个，可能会更容易：

In [120]: flat_coord = np.ravel_multi_index(coord.T, (3,3))

In [121]: sums = np.bincount(flat_coord, weights)

In [122]: a = np.zeros((3,3))

In [123]: a.flat[:len(sums)] = sums

In [124]: a
Out[124]: 
array([[ 1.,  0.,  3.],
       [ 1.,  0.,  3.],
       [ 0.,  2.,  0.]])

【讨论】：