向量化在数组的第三维上调用 numpy 函数答案

【问题标题】：Vectorize calling numpy function on third dimension of array向量化在数组的第三维上调用 numpy 函数
【发布时间】：2021-12-14 23:06:30
【问题描述】：

我有一个 3D numpy 数组 data，其中尺寸 a 和 b 代表图像的分辨率，c 是图像/帧号。我想在c 维度上的每个像素（a 和b 组合）上调用np.histogram，输出数组的维度为(a, b, BINS)。我已经用一个嵌套循环完成了这个任务，但是如何向量化这个操作呢？

hists = np.zeros((a, b, BINS))
for row in range(a):
    for column in range(b):
        hists[row, column, :] = np.histogram(data[row, column, :], bins=BINS)[0]

我相信这个解决方案是微不足道的，尽管如此，我们还是感谢所有帮助 :)

【问题讨论】：

这能回答你的问题吗？ Calculate histograms along axis

标签： python numpy

【解决方案1】：

np.histogram 在展平数组上进行计算。但是，您可以使用np.apply_along_axis。

np.apply_along_axis(lambda a: np.histogram(a, bins=BINS)[0], 2, data)

【讨论】：

请注意，虽然np.apply_along_axis 使用简单，但由于纯python lambda（无法在内部真正矢量化），它通常不会比循环快多少。事实上，在我的机器上稍微慢一点。

【解决方案2】：

这是一个有趣的问题。

制作一个最小的工作示例 (MWE)

这应该是对SO提问的主要习惯。

a, b, c = 2, 3, 4
data = np.random.randint(10, size=(a, b, c))
hists = np.zeros((a, b, c), dtype=int)
for row in range(a):
    for column in range(b):
        hists[row, column, :] = np.histogram(data[row, column, :], bins=c)[0]

data
>>> array([[[6, 4, 3, 3],
            [7, 3, 8, 0],
            [1, 5, 8, 0]],

           [[5, 5, 7, 8],
            [3, 2, 7, 8],
            [6, 8, 8, 0]]])
hists
>>> array([[[2, 1, 0, 1],
            [1, 1, 0, 2],
            [2, 0, 1, 1]],

           [[2, 0, 1, 1],
            [2, 0, 0, 2],
            [1, 0, 0, 3]]])

让它尽可能简单（但仍然有效）

您可以消除一个循环并简化它：

new_data = data.reshape(a*b, c)
new_hists = np.zeros((a*b, c), dtype=int)

for row in range(a*b):
    new_hists[row, :] = np.histogram(new_data[row, :], bins=c)[0]

new_hists
>>> array([[2, 1, 0, 1],
           [1, 1, 0, 2],
           [2, 0, 1, 1],
           [2, 0, 1, 1],
           [2, 0, 0, 2],
           [1, 0, 0, 3]])

new_data
>>> array([[6, 4, 3, 3],
           [7, 3, 8, 0],
           [1, 5, 8, 0],
           [5, 5, 7, 8],
           [3, 2, 7, 8],
           [6, 8, 8, 0]])

你能找到类似的问题并使用他们解决方案的关键点吗？

一般来说，您不能对循环中执行的类似操作进行矢量化：

for row in array:
    some_operation(row)

除了可以在展平数组上调用另一个矢量化操作然后将其移回初始形状的情况：

arr = array.ravel()
another_operation(arr)
out = arr.reshape(array.shape)

看起来你很幸运有np.histogram，因为我很确定类似的事情have been done before。

最终解决方案

new_data = data.reshape(a*b, c)
m, M = new_data.min(axis=1), new_data.max(axis=1)
bins = (c * (new_data - m[:, None]) // (M-m)[:, None])
out = np.zeros((a*b, c+1), dtype=int)
advanced_indexing = np.repeat(np.arange(a*b), c), bins.ravel()
np.add.at(out, advanced_indexing, 1)
out.reshape((a, b, -1))
>>> array([[[2, 1, 0, 0, 1],
            [1, 1, 0, 1, 1],
            [2, 0, 1, 0, 1]],

           [[2, 0, 1, 0, 1],
            [2, 0, 0, 1, 1],
            [1, 0, 0, 1, 2]]])

请注意，它会在每个直方图中添加一个额外的 bin 并将最大值放入其中，但我希望如果您需要修复它并不难。

【讨论】：