高效识别numpy矩阵中的相邻元素答案

【问题标题】：Efficient identification of adjacent elements in numpy matrix高效识别numpy矩阵中的相邻元素
【发布时间】：2016-12-23 11:57:35
【问题描述】：

我有一个 100 x 100 的 numpy 矩阵。矩阵大部分都用零填充，但也包含一些整数。例如：

[0 0 0 0 0 0 0 1]
[0 2 2 0 0 0 0 0]
[0 0 2 0 0 0 0 0]  False
[0 0 0 0 0 0 0 0]
[0 3 3 0 0 0 0 0]

识别矩阵是否包含任意数量的不同类型的相邻整数的最有效方法是什么？

以上示例将返回 False。这是一个 True 示例，其中包含指示的邻接的行：

[0 0 0 0 0 0 0 1]
[0 2 2 1 1 0 0 0]   <----  True
[0 0 2 0 0 0 0 0]  
[0 0 0 0 0 0 0 0]
[0 3 3 0 0 0 0 0]

对角线不算相邻。所以这个例子也会返回 False：

[0 0 0 1 1 1 1 1]
[0 2 2 0 1 0 0 0]
[0 0 2 0 0 0 0 0]   False
[0 3 0 0 0 0 0 0]
[3 3 3 0 0 0 0 0]

我不需要识别邻接的位置，只要它是否存在。

目前，我最好在矩阵中找到每个非零元素，然后检查其 4 个侧翼元素。

感谢所有精彩的回答。

【问题讨论】：

如果没有发布完整的答案，我会从带有轴参数的numpy.diff 开始。
谢谢，这应该会大大减少搜索区域。
您能否添加检查每个元素的代码。这可以消除关于测试的一些挥之不去的歧义。

标签： python numpy

【解决方案1】：

如果您可以使用scipy，那么使用ndimage.label 和ndimage.labeled_comprehension 将非常容易：

import numpy as np
from scipy import ndimage

def multiple_unique_item(array):
    return len(np.unique(array)) > 1

def adjacent_diff(array):
    labeled_array, num_labels = ndimage.label(array)
    labels = np.arange(1, num_labels+1)
    any_multiple = ndimage.labeled_comprehension(array, labeled_array, labels, 
                                                 multiple_unique_item, bool, 0)
    return any_multiple.any()

label 默认标记不为 0 且不带对角线的相邻值。然后，理解将与标签关联的所有值传递给辅助函数 - 该函数检查是否存在多个唯一值。最后，它检查任何标签是否有多个值并返回。

在您的测试输入数组上进行测试：

arr1 = np.array([[0,0,0,0,0,0,0,1],
                 [0,2,2,1,1,0,0,0],  
                 [0,0,2,0,0,0,0,0],
                 [0,0,0,0,0,0,0,0],
                 [0,3,3,0,0,0,0,0]])

arr2 = np.array([[0,0,0,1,1,1,1,1],
                 [0,2,2,0,1,0,0,0],
                 [0,0,2,0,0,0,0,0],  
                 [0,3,0,0,0,0,0,0],
                 [3,3,3,0,0,0,0,0]])

arr3 = np.array([[0,0,0,0,0,0,0,1],
                 [0,2,2,0,0,0,0,0],
                 [0,0,2,0,0,0,0,0],  
                 [0,0,0,0,0,0,0,0],
                 [0,3,3,0,0,0,0,0]])

>>> adjacent_diff(arr1)
True
>>> adjacent_diff(arr2)
False
>>> adjacent_diff(arr3)
False

【讨论】：

【解决方案2】：

查看您的问题的描述，检查每个可能的非零整数值在数组中的位置并查看是否存在交叉点可能不需要太多的计算工作。现在，这通常是多余的，但在你的规模上它可能会起作用：你可以获得每个整数集合的索引，并使用scipy.spatial.distance.cdist 计算它们的距离。我确信一些基于diff 或其他东西的智能解决方案更有效，但我还是玩得很开心：

import numpy as np
from scipy.spatial.distance import cdist
from itertools import combinations

M1 = np.array(
[[0,0,0,0,0,0,0,1],
 [0,2,2,1,1,0,0,0],  
 [0,0,2,0,0,0,0,0],
 [0,0,0,0,0,0,0,0],
 [0,3,3,0,0,0,0,0]])

M2 = np.array(
[[0,0,0,1,1,1,1,1],
 [0,2,2,0,1,0,0,0],
 [0,0,2,0,0,0,0,0],  
 [0,3,0,0,0,0,0,0],
 [3,3,3,0,0,0,0,0]])

def overlaps_eh(M):
    uniques = np.delete(np.unique(M),0) # get integers present
    unival_inds = [np.transpose(np.where(M==unival)) for unival in uniques]
    # unival_inds[k] contains the i,j indices of each element with the kth unique value

    for i1,i2 in combinations(range(len(unival_inds)),2):
        if np.any(cdist(unival_inds[i1],unival_inds[i2],'cityblock')==1):
            return True
    # if we're here: no adjacencies
    return False

首先，我们将非零唯一矩阵元素收集到数组uniques 中。然后对于每个唯一值，我们找到输入数组中具有该值的每个元素的i,j 索引。然后我们检查每对唯一值（使用itertools.combinations 生成），并使用scipy.spatial.distance.cdist 测量每对矩阵元素的成对距离。使用曼哈顿距离，如果任何一对元素的距离为 1，则它们是相邻的。所以我们只需要返回True，以防这些距离中的任何一个为1，否则我们返回False。

【讨论】：

【解决方案3】：

这是一种大量使用切片的方法，它只是关注性能的视图 -

def distinct_ints(a):
    # Mask of zeros, non-zeros as we would use them frequently
    zm = a==0
    nzm = ~zm

    # Look for distint ints across rows
    row_thresh = (nzm[:,1:] & zm[:,:-1]).sum(1)
    row_out = ((nzm[:,1:] & (a[:,1:] != a[:,:-1])).sum(1)>row_thresh).any()

    # Look for distint ints across cols
    col_thresh = (nzm[1:] & zm[:-1]).sum(0)
    col_out = ((nzm[1:] & (a[1:] != a[:-1])).sum(0)>col_thresh).any()

    # Any from rows or cols
    out = row_out | col_out
    return out

【讨论】：

【解决方案4】：

这是使用masked array 的解决方案：

import numpy as np
import numpy.ma as ma
a = np.array([[0,1,0], [0,1,0], [2,2,2]])    # sample data 
x = ma.masked_equal(a, 0)                    # mask zeros
adjacencies = np.count_nonzero(np.diff(x, axis=0).filled(0)) + np.count_nonzero(np.diff(x, axis=1).filled(0))

在最后一行，diff 应用于掩码数组（忽略零条目）； diff 中的非零条目表示数组 a 中相邻的不同非零条目。变量adjacencies 将具有邻接的总数（也许您只想知道它是否为0）。在上面的例子中，它是 1。

【讨论】：

【解决方案5】：

使用numpy.diff 可以做到这一点，但是，不应该考虑零的事实使事情变得有点复杂。

您可以将零设置为一个足够大或足够小的值而不会引起问题：

a[a == 0] = -999

或者使用浮点数组并将它们设置为nan 或inf：

a[a == 0] = numpy.nan

然后简单地寻找1 在每个方向上的一阶差异：

numpy.any(numpy.abs(numpy.diff(a, axis=0)) == 1) or numpy.any(numpy.abs(numpy.diff(a, axis=1)) == 1)

【讨论】：