一次将多个值分配给 numpy 数组的多个切片答案

【问题标题】：Assign multiple values to multiple slices of a numpy array at once一次将多个值分配给 numpy 数组的多个切片
【发布时间】：2016-12-19 19:26:39
【问题描述】：

我有一个 numpy 数组、一个定义数组内范围的开始/结束索引列表和一个值列表，其中值的数量与范围的数量相同。目前在循环中执行此分配非常慢，因此我想以矢量化的方式将值分配给数组中的相应范围。这可以吗？

这是一个具体的简化示例：

a = np.zeros([10])

这是定义a 内范围的开始索引列表和结束索引列表，如下所示：

starts = [0, 2, 4, 6]
ends = [2, 4, 6, 8]

这是我想分配给每个范围的值列表：

values = [1, 2, 3, 4]

我有两个问题。首先是我不知道如何同时使用多个切片对数组进行索引，因为范围列表是在实际代码中动态构建的。一旦我能够提取范围，我不确定如何一次分配多个值 - 每个范围一个值。

以下是我尝试创建切片列表的方法以及在使用该列表索引数组时遇到的问题：

slices = [slice(start, end) for start, end in zip(starts, ends)]


In [97]: a[slices]
...
IndexError: too many indices for array

In [98]: a[np.r_[slices]]
...
IndexError: arrays used as indices must be of integer (or boolean) type

如果我使用静态列表，我可以一次提取多个切片，但是分配不能按我想要的方式工作：

In [106]: a[np.r_[0:2, 2:4, 4:6, 6:8]] = [1, 2, 3]
/usr/local/bin/ipython:1: DeprecationWarning: assignment will raise an error in the future, most likely because your index result shape does not match the value array shape. You can use `arr.flat[index] = values` to keep the old behaviour.
  #!/usr/local/opt/python/bin/python2.7

In [107]: a
Out[107]: array([ 1.,  2.,  3.,  1.,  2.,  3.,  1.,  2.,  0.,  0.])

我真正想要的是这个：

np.array([1., 1., 2., 2., 3., 3., 4., 4., 0., 0.])

【问题讨论】：

是否保证每个切片都从前一个切片结束的地方开始？
不，切片之间可能存在间隙。唯一的保证是它们不会重叠。

标签： python numpy vectorization

【解决方案1】：

这将以完全矢量化的方式解决问题：

counts = ends - starts
idx = np.ones(counts.sum(), dtype=np.int)
idx[np.cumsum(counts)[:-1]] -= counts[:-1]
idx = np.cumsum(idx) - 1 + np.repeat(starts, counts)

a[idx] = np.repeat(values, count)

【讨论】：

【解决方案2】：

一种可能性是使用值压缩开始、结束索引并手动广播索引和值：

starts = [0, 2, 4, 6]
ends = [2, 4, 6, 8]
values = [1, 2, 3, 4]
a = np.zeros(10)

import numpy as np
# calculate the index array and value array by zipping the starts, ends and values and expand it
idx, val = zip(*[(list(range(s, e)), [v] * (e-s)) for s, e, v in zip(starts, ends, values)])

# assign values
a[np.array(idx).flatten()] = np.array(val).flatten()

a
# array([ 1.,  1.,  2.,  2.,  3.,  3.,  4.,  4.,  0.,  0.])

或者编写一个 for 循环来按一个范围分配值：

for s, e, v in zip(starts, ends, values):
    a[slice(s, e)] = v

a
# array([ 1.,  1.,  2.,  2.,  3.,  3.,  4.,  4.,  0.,  0.])

【讨论】：

为简单起见，很难在最后一个循环中击败您。而且我怀疑它会和任何替代方案一样快，尤其是从这 3 个列表开始时。
如果根据示例有很多短距离，我怀疑我的答案会更快，即使它涉及到 6 次数据传递，而使用这种方法只有一次。但事实上，这肯定更简单，更易读。