检查相邻值是否在 Numpy 矩阵中答案

【问题标题】：Checking if Adjacent Values are in a Numpy Matrix检查相邻值是否在 Numpy 矩阵中
【发布时间】：2017-01-01 20:24:32
【问题描述】：

因此，我目前正在尝试找出一种更优化的解决方案来确定图像中的连接组件。目前，我有一个具有特定值的坐标数组。我想根据它们是否接触来创建这些坐标的组。我正在使用一个 numpy 数组，目前我必须检查每个值（左上、中上、右上、左中、右中、左下、中下、右下）是否在该数组中。我通过以下代码这样做：

for x in range (0, groupCoords.shape[0]):
            global tgroup
            xCoord = groupCoords.item((x,0))
            yCoord = groupCoords.item((x,1))
            new = np.array([[xCoord, yCoord]])
            if np.equal(Arr,[xCoord, yCoord+1]).all(1).any():
                tgroup = np.append(tgroup, [[xCoord,yCoord+1]], axis=0)
                new = np.append(new, [[xCoord,yCoord+1]], axis=0)
                index = np.argwhere((Arr == [xCoord,yCoord+1]).all(1))
                Arr = np.delete(Arr, (index), axis=0)

            if np.equal(Arr,[xCoord, yCoord-1]).all(1).any():
                tgroup = np.append(tgroup, [[xCoord, yCoord-1]],axis=0)
                new = np.append(new, [[xCoord,yCoord-1]], axis=0)
                index = np.argwhere((Arr == [xCoord,yCoord-1]).all(1))
                Arr = np.delete(Arr, (index), axis=0)

            if np.equal(Arr,[xCoord+1, yCoord]).all(1).any():
                tgroup = np.append(tgroup, [[xCoord+1,yCoord]],axis=0)
                new = np.append(new, [[xCoord+1,yCoord]], axis=0)
                index = np.argwhere((Arr == [xCoord+1,yCoord]).all(1))
                Arr = np.delete(Arr, (index), axis=0)

            if np.equal(Arr,[xCoord+1, yCoord+1]).all(1).any():
                tgroup = np.append(tgroup, [[xCoord+1,yCoord+1]],axis=0)
                new = np.append(new, [[xCoord+1,yCoord+1]], axis=0)
                index = np.argwhere((Arr == [xCoord+1,yCoord+1]).all(1))
                Arr = np.delete(Arr, (index), axis=0)

            if np.equal(Arr,[xCoord+1, yCoord-1]).all(1).any():
                tgroup = np.append(tgroup, [[xCoord+1,yCoord-1]],axis=0)
                new = np.append(new, [[xCoord+1,yCoord-1]], axis=0)
                index = np.argwhere((Arr == [xCoord+1,yCoord-1]).all(1))
                Arr = np.delete(Arr, (index), axis=0)

            if np.equal(Arr,[xCoord-1, yCoord]).all(1).any():
                tgroup = np.append(tgroup, [[xCoord-1,yCoord]],axis=0)
                new = np.append(new, [[xCoord-1,yCoord]], axis=0)
                index = np.argwhere((Arr == [xCoord-1,yCoord]).all(1))
                Arr = np.delete(Arr, (index), axis=0)

            if np.equal(Arr,[xCoord-1, yCoord+1]).all(1).any():
                tgroup = np.append(tgroup, [[xCoord-1,yCoord+1]],axis=0)
                new = np.append(new, [[xCoord-1,yCoord+1]], axis=0)
                index = np.argwhere((Arr == [xCoord-1,yCoord+1]).all(1))
                Arr = np.delete(Arr, (index), axis=0)

            if np.equal(Arr,[xCoord-1, yCoord-1]).all(1).any():
                tgroup = np.append(tgroup, [[xCoord-1,yCoord-1]],axis=0)
                new = np.append(new, [[xCoord-1,yCoord-1]], axis=0)
                index = np.argwhere((Arr == [xCoord-1,yCoord-1]).all(1))
                Arr = np.delete(Arr, (index), axis=0)

但是，如果图像很大，这显然会花费大量时间。我的想法是创建一个具有图像宽度和高度尺寸的布尔矩阵，然后将值“true”分配给矩阵中与图像中的像素相对应的值（图像是黑白的）。

我想知道，是否有可能不必像那样检查每个值，而是确定它们是否是直接围绕另一个“真”值的元素？

这是输入数组的样子：

[
 [0 0]
 [0 1]
 [0 2]
 [10 2]

]

输出看起来像

[
 [0 0]
 [0 1]
 [0 2]
]

我希望改进的功能将检查“真实”值是否接触，并创建所有接触值的“网络”（它将继续运行找到的新值）。

【问题讨论】：

坐标顺序重要吗？我想我们正在过滤掉那里的坐标。您是否需要它们按照它们从一端连接到另一端的顺序（如果适用）或其他一些标准？
您能否举一个输入（例如 4x4 数组）和预期输出的小例子？
@Divakar 顺序无所谓，我只需要将它们分组到一个数组中。我将其用作 OCR 方法，因此它所做的是创建连接组件的列表，最终成为图像中的每个字符。
@WarrenWeckesser 我包含了一个示例输入。该函数将做的是获取数组中的坐标，例如 (1, 4)。然后它将直接检查旁边的值，并输出一个数组，该数组将为 [(0,4), (1,3), (1,5), (2,4), (2,5)]
谢谢。听起来这可能是实现的细节。你说你想“创建一个所有接触值的'网络'”，所以在更高层次上，说输入是布尔数组，输出是集合的集合（例如列表）是否正确坐标，每个连接组件的一个（外部）集合？例如。如果输入是[[T, T, F, F], [F, F, F, T], [F, F, F, T]]，则输出类似于[[(0, 0), (0, 1)], [(1, 3), (2, 3)]]?

标签： python arrays numpy

【解决方案1】：

方法#1

我们可以得到欧几里得距离，看看是否有任何距离在sqrt(2) 之内，这将用distance = 1 覆盖up-down，用distance = sqrt(2) 覆盖对角线。这会给我们一个掩码，当它被索引到组坐标数组中时，它会给我们从中连接的那些。

因此，使用Scipy's cdist 获取欧几里得距离的实现将是 -

from scipy.spatial.distance import cdist

out = groupCoords[(cdist(groupCoords,Arr)<1.5).any(1)]

示例运行 -

In [401]: Arr
Out[401]: 
array([[ 5,  4],
       [11, 12],
       [ 5,  3],
       [ 1,  3],
       [15,  8],
       [55, 21]])

In [402]: groupCoords
Out[402]: 
array([[2, 3],  # In neighbourhood of (1,3)
       [5, 6],
       [6, 2],  # In neighbourhood of (5,3)
       [5, 3],  # In neighbourhood of (5,4)
       [5, 8]])

In [403]: groupCoords[(cdist(groupCoords,Arr)<1.5).any(1)]
Out[403]: 
array([[2, 3],
       [6, 2],
       [5, 3]])

方法 #2

另一种方法是检查两个数组的第一列之间的绝对元素差异，第二列也是如此。最后，从这两个掩码中获取一个联合掩码，并检查任何匹配项并再次索引到组数组中以获取过滤后的坐标。

因此，这种方法的实现将是 -

col0_mask = (np.abs(groupCoords[:,0,None] - Arr[:,0])<=1)
col1_mask = (np.abs(groupCoords[:,1,None] - Arr[:,1])<=1)
out = groupCoords[(col0_mask & col1_mask).any(1)]

方法#3

如果您将 Arr 作为布尔数组而不是 2 列坐标数组，则另一种方法可能会更好。这个想法是Arr 中的dilate this boolean array，然后查看来自groupCoords 的哪些坐标也将位于这个放大的图像中。对于膨胀，我们将使用所有的3 x 3 内核来覆盖所有这些邻域。为了检测这些共同点，我们需要用这些groupCoords 绘制图像。

因此，代码将是 -

from scipy.ndimage.morphology import binary_dilation

img = np.zeros(Arr.shape,dtype=bool)
img[groupCoords[:,0],groupCoords[:,1]] = 1
out = np.argwhere(binary_dilation(Arr,np.ones((3,3))) & img)

示例运行 -

In [444]: # Inputs : groupCoords and let's create a sample array for Arr
     ...: groupCoords = np.array([[2,3],[5,6],[6,2],[5,3],[5,8]])
     ...: 
     ...: Arr_Coords = np.array([[5,4],[11,12],[5,3],[1,3],[15,8],[55,21]])
     ...: Arr = np.zeros(Arr_Coords.max(0)+1,dtype=bool)
     ...: Arr[Arr_Coords[:,0], Arr_Coords[:,1]] = 1
     ...: 

In [445]: img = np.zeros(Arr.shape,dtype=bool)
     ...: img[groupCoords[:,0],groupCoords[:,1]] = 1
     ...: out = np.argwhere(binary_dilation(Arr,np.ones((3,3))) & img)
     ...: 

In [446]: out
Out[446]: 
array([[2, 3],
       [5, 3],
       [6, 2]])

【讨论】：

【解决方案2】：

根据您的代码的最终目标，您可能会发现scipy.ndimage.label 及其亲属很有用。

例如，

In [44]: from scipy.ndimage import label

In [45]: x
Out[45]: 
array([[ True,  True, False, False,  True],
       [False, False, False,  True,  True],
       [False,  True, False,  True, False],
       [ True,  True, False, False, False]], dtype=bool)

In [46]: x.astype(int)  # More concise, easier to read
Out[46]: 
array([[1, 1, 0, 0, 1],
       [0, 0, 0, 1, 1],
       [0, 1, 0, 1, 0],
       [1, 1, 0, 0, 0]])

label 返回两个值。第一个是与输入数组大小相同的数组。输入中每个不同的连通分量都被分配一个整数值，从 1 开始。背景为 0。第二个返回值是找到的分量数。

In [47]: labeled_arr, nlabels = label(x)

In [48]: nlabels
Out[48]: 3

In [49]: labeled_arr
Out[49]: 
array([[1, 1, 0, 0, 2],
       [0, 0, 0, 2, 2],
       [0, 3, 0, 2, 0],
       [3, 3, 0, 0, 0]], dtype=int32)

在下面，where(labeled_array = i) 返回一个包含两个数组的元组。这些数组分别是连通分量的行和列索引：

In [50]: for i in range(1, nlabels+1):
    ...:     print(where(labeled_arr == i))
    ...:     
(array([0, 0]), array([0, 1]))
(array([0, 1, 1, 2]), array([4, 3, 4, 3]))
(array([2, 3, 3]), array([1, 0, 1]))

您可以将它们压缩在一起以将它们转换为 (row, col) 对的列表：

In [52]: for i in range(1, nlabels+1):
    ...:     print(list(zip(*where(labeled_arr == i))))
    ...:     
[(0, 0), (0, 1)]
[(0, 4), (1, 3), (1, 4), (2, 3)]
[(2, 1), (3, 0), (3, 1)]

【讨论】：