Python / numpy：删除3D数组的空（零）边框答案

【问题标题】：Python / numpy: Remove empty (zeroes) border of 3D arrayPython / numpy：删除3D数组的空（零）边框
【发布时间】：2019-07-01 06:05:36
【问题描述】：

我有一个 3D numpy 数组。这可以被认为是一个图像（准确地说是场点的值）。我想删除所有维度的边框（0 值，注意可能有负值）。限制是所有分子的尺寸保持相同，例如。我只想删除边界，只要该维度中的“最大”条目仍在边界内。所以需要考虑整个数据集（小，大小不是问题）。

2D 示例：

0  0  0  0  0
0  1  0  0  0
0  1  1  0  0
0  0  0  0  0
0  0  0  0  0

0  0  0  0  0
0  0  0  0  0
0  0  1  0  0
0  0  0  1  0
0  0  0  1  0

这里应该删除最上面的行和最左边和最右边的列。在整个数据集中，它们只包含 0 个值。

结果如下：

由于我不是 numpy 专家，因此我无法定义一种算法来满足我的需求。我需要找到每个维度中不为 0 的最小和最大索引，然后使用它来修剪数组。

类似于this，但在 3D 中，裁剪必须考虑到整个数据集。

我怎样才能做到这一点？

2019 年 2 月 13 日更新：

所以我在这里尝试了 3 个答案（一个似乎已被删除的使用 zip 的答案），Martins 和 norok2s 的答案。输出尺寸是相同的，所以我假设它们都有效。

我选择 Martins 解决方案，因为我可以轻松提取边界框并将其应用于测试集。

2 月 25 日更新：

如果有人仍在观察这一点，我想进一步提供意见。如前所述，这些实际上不是图像，而是“字段值”，意思是浮点而不是灰度图像（uint8），这意味着我至少需要使用 float16，而这只是需要太多内存。（我有 48gb 可用，但即使是 50% 的训练集也不够）。

【问题讨论】：

较小的数组相对于最大的数组应该放在哪里？我的意思是，在一维中，假设最大的对象是例如[1, 0, 1, 1] 和一个较小的（减少的）是[1, 1] 应该变成[0, 0, 1, 1]（结束）、[0, 1, 1, 0]（中间）还是[1, 1, 0, 0]（开始）？
最初的一切都具有相同的大小。在最终结果中，每个剩余值/像素的“相对”坐标应保持不变。
@beginner_ 检查我的最新编辑。它现在一定能如你所愿地工作
@beginner_ 你的问题得到解答了吗？
@Martin 这里很忙。还没有机会验证哪个答案最有效

标签： python numpy image-processing multidimensional-array

【解决方案1】：

试试这个： - 它的主要算法。我不明白你想从你的例子中提取哪些方面，但下面的算法应该很容易让你根据需要修改

注意：此算法提取所有零值边界都被“删除”的 CUBE。所以立方体的每一边都有一些值！= 0

import numpy as np

# testing dataset
d = np.zeros(shape = [5,5,5]) 

# fill some values
d[3,2,1]=1
d[3,3,1]=1
d[1,3,1]=1
d[1,3,4]=1

# find indexes in all axis
xs,ys,zs = np.where(d!=0) 
# for 4D object
# xs,ys,zs,as = np.where(d!=0) 

# extract cube with extreme limits of where are the values != 0
result = d[min(xs):max(xs)+1,min(ys):max(ys)+1,min(zs):max(zs)+1] 
# for 4D object
# result = d[min(xs):max(xs)+1,min(ys):max(ys)+1,min(zs):max(zs)+1,min(as):max(as)+1]

>>> result.shape
(3, 2, 4)

案例一：

d = np.zeros(shape = [5,5,5])

d[3,2,1]=1
# ...  just one value

>>> result.shape # works

(1,1,1)

案例 2：# 错误案例 - 只有零 - 生成的 3D 没有维度 -> 错误

d = np.zeros(shape = [5,5,5]) # no values except zeros
>>> result.shape


Traceback (most recent call last):
  File "C:\Users\zzz\Desktop\py.py", line 7, in <module>
    result = d[min(xs):max(xs)+1,min(ys):max(ys)+1,min(zs):max(zs)+1]
ValueError: min() arg is an empty sequence

编辑：因为我的解决方案没有得到足够的爱和理解，我将提供第 4 维主体的示例，其中 3 维是免费的图像，第 4 维是存储图像的地方

import numpy as np


class ImageContainer(object):
    def __init__(self,first_image):
        self.container =  np.uint8(np.expand_dims(np.array(first_image), axis=0))

    def add_image(self,image):
        #print(image.shape)
        temp = np.uint8(np.expand_dims(np.array(image), axis=0))
        #print(temp.shape)
        self.container  = np.concatenate((self.container,temp),axis = 0)
        print('container shape',self.container.shape)

# Create image container storage

image = np.zeros(shape = [5,5,3]) # some image
image[2,2,1]=1 # put something random in it
container = ImageContainer(image)
image = np.zeros(shape = [5,5,3]) # some image
image[2,2,2]=1
container.add_image(image)
image = np.zeros(shape = [5,5,3]) # some image
image[2,3,0]=1    # if we set [2,2,0] = 1, we can expect all images will have just 1x1 pixel size
container.add_image(image)
image = np.zeros(shape = [5,5,3]) # some image
image[2,2,1]=1
container.add_image(image)
>>> container.container.shape
('container shape', (4, 5, 5, 3)) # 4 images, size 5x5, 3 channels


# remove borders to all images at once
xs,ys,zs,zzs = np.where(container.container!=0) 
# for 4D object

# extract cube with extreme limits of where are the values != 0
result = container.container[min(xs):max(xs)+1,min(ys):max(ys)+1,min(zs):max(zs)+1,min(zzs):max(zzs)+1]

>>> print('Final shape:',result.shape) 


('Final shape', (4, 1, 2, 3)) # 4 images, size: 1x2, 3 channels

【讨论】：

不应移除下边框，因为第二张图片的底行有 1。裁剪应考虑所有图像，并且每个像素的“相对”坐标应保持不变。
好的。我很困惑。现在应该是正确的。放入 3D 体，“结果”中的数组应该是一个极值的最小立方体！=0 接触边
要么我遗漏了一些东西，要么这不适用于整个数据集。如果我有 10 张 3D 图像，我需要找到包含所有 10 张图像中所有非零值的边界框。
如果您有 10 个 3D 图像，那么您只需将该数组放入我的脚本中，它应该可以工作
我以为我对脚本中的测试数据集很清楚

【解决方案2】：

更新：

基于 Martin 的使用 min/max 和 np.where 的解决方案，但是将其推广到任何维度，您可以这样做：

def bounds_per_dimension(ndarray):
    return map(
        lambda e: range(e.min(), e.max() + 1),
        np.where(ndarray != 0)
    )

def zero_trim_ndarray(ndarray):
    return ndarray[np.ix_(*bounds_per_dimension(ndarray))]

d = np.array([[
    [0, 0, 0, 0, 0],
    [0, 1, 0, 0, 0],
    [0, 1, 1, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
], [
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 1, 0, 0],
    [0, 0, 0, 1, 0],
    [0, 0, 0, 1, 0],
]])

zero_trim_ndarray(d)

【讨论】：

使用 range() 和 np.ix_() 会变得不必要地慢。如果您将该代码与slice() / arr[] 方法（如我的回答中使用的）相匹配，即使对于这个使用d 作为输入的简单示例，您也会得到~2 倍的速度差异。

【解决方案3】：

您可以将您的问题视为对数组上的特定边界框进行修剪，该数组是通过将您拥有的所有形状放在一个数组中而形成的。

因此，如果你有一个 n 维修剪功能，解决方案就是应用它。

实现这一点的一种方法是：

import numpy as np

def trim(arr, mask):
    bounding_box = tuple(
        slice(np.min(indexes), np.max(indexes) + 1)
        for indexes in np.where(mask))
    return arr[bounding_box]

FlyingCircus 提供了一个稍微灵活的解决方案（您可以在其中指明要作用于哪个轴）（免责声明：我是该软件包的主要作者）。

因此，如果您有 n-dim 数组列表（在 arrs 中），您可以先使用 np.stack() 堆叠它们，然后修剪结果：

import numpy as np

arr = np.stack(arrs, -1)
trimmed_arr = trim(arr, arr != 0)

然后可以使用np.split() 将其分开，例如：

trimmed_list = np.split(trimmed_arr, arr.shape[-1], -1)

编辑：

我刚刚意识到，这使用的方法与其他答案基本相同，只是它对我来说看起来更干净。

【讨论】：

这很酷。我更喜欢它作为单个数组的单线： return arr[tuple(slice(np.min(idx), np.max(idx) + 1) for idx in np.where(arr != 0))]