numpy数组的边界框答案

【问题标题】：bounding box of numpy arraynumpy数组的边界框
【发布时间】：2015-10-02 17:52:10
【问题描述】：

假设您有一个 2D numpy 数组，其中包含一些随机值和周围的零。

“倾斜矩形”示例：

import numpy as np
from skimage import transform

img1 = np.zeros((100,100))
img1[25:75,25:75] = 1.
img2 = transform.rotate(img1, 45)

现在我想为所有非零数据找到最小的边界矩形。例如：

a = np.where(img2 != 0)
bbox = img2[np.min(a[0]):np.max(a[0])+1, np.min(a[1]):np.max(a[1])+1]

实现此结果的最快方法是什么？我确信有更好的方法，因为如果我是例如 np.where 函数需要相当长的时间。使用 1000x1000 数据集。

编辑：也应该在 3D 中工作...

【问题讨论】：

标签： python arrays numpy transformation

【解决方案1】：

您可以使用np.any 将包含非零值的行和列减少为一维向量，而不是使用np.where 查找所有非零值的索引，从而将执行时间大致减半：

def bbox1(img):
    a = np.where(img != 0)
    bbox = np.min(a[0]), np.max(a[0]), np.min(a[1]), np.max(a[1])
    return bbox

def bbox2(img):
    rows = np.any(img, axis=1)
    cols = np.any(img, axis=0)
    rmin, rmax = np.where(rows)[0][[0, -1]]
    cmin, cmax = np.where(cols)[0][[0, -1]]

    return rmin, rmax, cmin, cmax

一些基准测试：

%timeit bbox1(img2)
10000 loops, best of 3: 63.5 µs per loop

%timeit bbox2(img2)
10000 loops, best of 3: 37.1 µs per loop

将此方法扩展到 3D 案例只涉及沿每对轴执行缩减：

def bbox2_3D(img):

    r = np.any(img, axis=(1, 2))
    c = np.any(img, axis=(0, 2))
    z = np.any(img, axis=(0, 1))

    rmin, rmax = np.where(r)[0][[0, -1]]
    cmin, cmax = np.where(c)[0][[0, -1]]
    zmin, zmax = np.where(z)[0][[0, -1]]

    return rmin, rmax, cmin, cmax, zmin, zmax

通过使用itertools.combinations 迭代每个独特的轴组合以执行缩减，很容易将其推广到 N 维：

import itertools

def bbox2_ND(img):
    N = img.ndim
    out = []
    for ax in itertools.combinations(reversed(range(N)), N - 1):
        nonzero = np.any(img, axis=ax)
        out.extend(np.where(nonzero)[0][[0, -1]])
    return tuple(out)

如果你知道原始边界框的角坐标、旋转角度和旋转中心，则可以通过计算对应的affine transformation matrix并打点直接得到变换后的边界框角坐标输入坐标：

def bbox_rotate(bbox_in, angle, centre):

    rmin, rmax, cmin, cmax = bbox_in

    # bounding box corners in homogeneous coordinates
    xyz_in = np.array(([[cmin, cmin, cmax, cmax],
                        [rmin, rmax, rmin, rmax],
                        [   1,    1,    1,    1]]))

    # translate centre to origin
    cr, cc = centre
    cent2ori = np.eye(3)
    cent2ori[:2, 2] = -cr, -cc

    # rotate about the origin
    theta = np.deg2rad(angle)
    rmat = np.eye(3)
    rmat[:2, :2] = np.array([[ np.cos(theta),-np.sin(theta)],
                             [ np.sin(theta), np.cos(theta)]])

    # translate from origin back to centre
    ori2cent = np.eye(3)
    ori2cent[:2, 2] = cr, cc

    # combine transformations (rightmost matrix is applied first)
    xyz_out = ori2cent.dot(rmat).dot(cent2ori).dot(xyz_in)

    r, c = xyz_out[:2]

    rmin = int(r.min())
    rmax = int(r.max())
    cmin = int(c.min())
    cmax = int(c.max())

    return rmin, rmax, cmin, cmax

这比将np.any 用于您的小示例数组要快得多：

%timeit bbox_rotate([25, 75, 25, 75], 45, (50, 50))
10000 loops, best of 3: 33 µs per loop

但是，由于此方法的速度与输入数组的大小无关，因此对于较大的数组来说速度会快很多。

将变换方法扩展到 3D 稍微复杂一些，因为旋转现在具有三个不同的分量（一个关于 x 轴，一个关于 y 轴，一个关于 z 轴），但是基本方法是一样的：

def bbox_rotate_3d(bbox_in, angle_x, angle_y, angle_z, centre):

    rmin, rmax, cmin, cmax, zmin, zmax = bbox_in

    # bounding box corners in homogeneous coordinates
    xyzu_in = np.array(([[cmin, cmin, cmin, cmin, cmax, cmax, cmax, cmax],
                         [rmin, rmin, rmax, rmax, rmin, rmin, rmax, rmax],
                         [zmin, zmax, zmin, zmax, zmin, zmax, zmin, zmax],
                         [   1,    1,    1,    1,    1,    1,    1,    1]]))

    # translate centre to origin
    cr, cc, cz = centre
    cent2ori = np.eye(4)
    cent2ori[:3, 3] = -cr, -cc -cz

    # rotation about the x-axis
    theta = np.deg2rad(angle_x)
    rmat_x = np.eye(4)
    rmat_x[1:3, 1:3] = np.array([[ np.cos(theta),-np.sin(theta)],
                                 [ np.sin(theta), np.cos(theta)]])

    # rotation about the y-axis
    theta = np.deg2rad(angle_y)
    rmat_y = np.eye(4)
    rmat_y[[0, 0, 2, 2], [0, 2, 0, 2]] = (
        np.cos(theta), np.sin(theta), -np.sin(theta), np.cos(theta))

    # rotation about the z-axis
    theta = np.deg2rad(angle_z)
    rmat_z = np.eye(4)
    rmat_z[:2, :2] = np.array([[ np.cos(theta),-np.sin(theta)],
                               [ np.sin(theta), np.cos(theta)]])

    # translate from origin back to centre
    ori2cent = np.eye(4)
    ori2cent[:3, 3] = cr, cc, cz

    # combine transformations (rightmost matrix is applied first)
    tform = ori2cent.dot(rmat_z).dot(rmat_y).dot(rmat_x).dot(cent2ori)
    xyzu_out = tform.dot(xyzu_in)

    r, c, z = xyzu_out[:3]

    rmin = int(r.min())
    rmax = int(r.max())
    cmin = int(c.min())
    cmax = int(c.max())
    zmin = int(z.min())
    zmax = int(z.max())

    return rmin, rmax, cmin, cmax, zmin, zmax

我基本上只是使用来自here 的旋转矩阵表达式修改了上面的函数 - 我还没有时间编写测试用例，所以请谨慎使用。

【讨论】：

不错！如何将其扩展到 3D 案例？我还能以某种方式使用 np.any 吗？
@ali_m:bbox2 是一个非常好的解决方案，尤其是在有大量空行/列的情况下，比stackoverflow.com/a/4809040/483620 快大约一个数量级，但我猜在没有非零行/列的极端情况下，性能会相似或更差。
@Benjamin 如果该解决方案能够击败bbox2，我会感到惊讶，即使对于非常大的全密集阵列也是如此。在该解决方案中，np.argwhere 的输入和输出数组随数组大小呈二次方增加，而在bbox2 中，np.where 的输入和输出数组仅线性增加。可以使其更快的一种技巧是使用np.argmax(rows) 和rows.size - 1 - np.argmax(rows[::-1]) 而不是np.where 来获取rows 和cols 中的第一个和最后一个非零值。
我在这段代码中发现了一个可能的错误。 xmin、ymin和zmin应该加-1，xmax、ymax和zmax应该加1。
我认为 ND 解决方案需要一些反转，因为 itertools.combinations 确实产生了所需轴顺序的反转。

【解决方案2】：

这是一个计算 N 维数组边界框的算法，

def get_bounding_box(x):
    """ Calculates the bounding box of a ndarray"""
    mask = x == 0
    bbox = []
    all_axis = np.arange(x.ndim)
    for kdim in all_axis:
        nk_dim = np.delete(all_axis, kdim)
        mask_i = mask.all(axis=tuple(nk_dim))
        dmask_i = np.diff(mask_i)
        idx_i = np.nonzero(dmask_i)[0]
        if len(idx_i) != 2:
            raise ValueError('Algorithm failed, {} does not have 2 elements!'.format(idx_i))
        bbox.append(slice(idx_i[0]+1, idx_i[1]+1))
    return bbox

可用于2D、3D等数组，如下所示，

In [1]: print((img2!=0).astype(int))
   ...: bbox = get_bounding_box(img2)
   ...: print((img2[bbox]!=0).astype(int))
   ...: 
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0]
 [0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0]
 [0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0]
 [0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0]
 [0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0]
 [0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0]
 [0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0]
 [0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
[[0 0 0 0 0 0 1 1 0 0 0 0 0 0]
 [0 0 0 0 0 1 1 1 1 0 0 0 0 0]
 [0 0 0 0 1 1 1 1 1 1 0 0 0 0]
 [0 0 0 1 1 1 1 1 1 1 1 0 0 0]
 [0 0 1 1 1 1 1 1 1 1 1 1 0 0]
 [0 1 1 1 1 1 1 1 1 1 1 1 1 0]
 [1 1 1 1 1 1 1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 1 1 1 1 1 1]
 [0 1 1 1 1 1 1 1 1 1 1 1 1 0]
 [0 0 1 1 1 1 1 1 1 1 1 1 0 0]
 [0 0 0 1 1 1 1 1 1 1 1 0 0 0]
 [0 0 0 0 1 1 1 1 1 1 0 0 0 0]
 [0 0 0 0 0 1 1 1 1 0 0 0 0 0]
 [0 0 0 0 0 0 1 1 0 0 0 0 0 0]]

虽然用一个np.where 替换np.diff 和np.nonzero 调用可能会更好。

【讨论】：

比ali_m的方法慢但是很一般，我喜欢！

【解决方案3】：

通过将np.where 替换为np.argmax 并处理布尔掩码，我能够挤出更多性能。

定义 bbox(img): img = (img > 0) 行= np.any（img，轴= 1） cols = np.any(img, 轴 = 0) rmin, rmax = np.argmax(rows), img.shape[0] - 1 - np.argmax(np.flipud(rows)) cmin, cmax = np.argmax(cols), img.shape[1] - 1 - np.argmax(np.flipud(cols)) 返回rmin、rmax、cmin、cmax

在相同的基准测试中，这比上面的 bbox2 解决方案快了大约 10µs。还应该有一种方法可以只使用 argmax 的结果来查找非零行和列，避免使用 np.any 完成的额外搜索，但这可能需要一些我无法工作的棘手索引使用简单的矢量化代码高效地完成。

【讨论】：

对我来说效率略低，有许多全零行/列。

【解决方案4】：

我知道这篇文章已经过时并且已经得到答复，但我相信我已经确定了一种优化方法，用于大型数组和加载为 np.memmaps 的数组。

我使用 ali_m 的响应，该响应由 Allen Zelener 针对较小的 ndarray 进行了优化，但这种方法对于 np.memmaps 来说非常慢。

下面是我的实现，它的性能速度与 ali_m 对适合工作内存的数组的处理方法非常相似，但在绑定大型数组或 np.memmaps 时性能要好得多。

import numpy as np
from numba import njit, prange

@njit(parallel=True, nogil=True, cache=True)
def bound(volume):  
    """ 
        Bounding function to bound large arrays and np.memmaps
        volume: A 3D np.array or np.memmap
    """
    mins = np.array(volume.shape)
    maxes = np.zeros(3)
    for z in prange(volume.shape[0]):
        for y in range(volume.shape[1]):
            for x in range(volume.shape[2]):
                if volume[z,y,x]:
                    if z < mins[0]:
                        mins[0] = z
                    elif z > maxes[0]:
                        maxes[0] = z
                    if y < mins[1]:
                        mins[1] = y
                    elif y > maxes[1]:
                        maxes[1] = y
                    if x < mins[2]:
                        mins[2] = x
                    elif x > maxes[2]:
                        maxes[2] = x
    return mins, maxes

我的方法有点低效，因为它只是迭代每个点，而不是在特定维度上展平数组。但是，我发现使用带有维度参数的 np.any() 展平 np.memmaps 非常慢。我尝试使用 numba 来加速展平，但它不支持带参数的 np.any() 。因此，我采用了似乎表现良好的迭代方法。

在我的电脑（2019 16" MacBook Pro，6 核 i7，16 GB 2667 MHz DDR4）上，我可以在 中绑定形状为 (1915, 4948, 3227) 的 np.memmap ~33 秒，而 ali_m 方法大约需要 ~250 秒。

不确定是否有人会看到这个，但希望它在需要绑定 np.memmaps 的小众案例中有所帮助。

【讨论】：