识别沿堆叠二维 numpy 数组的第三维的所有唯一组合答案

【问题标题】：Identify all unique combinations along the third dimension of stackd 2D numpy arrays识别沿堆叠二维 numpy 数组的第三维的所有唯一组合
【发布时间】：2020-01-29 15:16:37
【问题描述】：

对于沿axis=0 堆叠的 2 个或多个 2D 整数 numpy 数组，我感兴趣：

沿第三个标识所有唯一的数字组合方面。
用新的数值标记每个组合 ('标签')
生成一个新的二维数组，其中数组值是表示源数组的数值组合的标签。

样本数据：

import numpy as np
arr1 = np.array(np.random.randint(low=0, high=4, size=25)).reshape(5,5)
arr2 = np.array(np.random.randint(low=0, high=4, size=25)).reshape(5,5)

可以获得感兴趣组合的元组列表：

xx, yy = np.meshgrid(arr1, arr2, sparse=True)
combis = np.stack([xx.reshape(arr1.size), yy.reshape(arr2.size)])
u_combis = np.unique(combis, axis=1)
u_combis_lst = list(map(tuple, u_combis.T))

生成字典以将每个组合映射到标签：

labels = [x for x in range(0, len(u_combis_lst))]
label_dict = dict(zip(u_combis_lst, labels))

现在，要点 1 和 2 似乎已实现。我的问题是：

如何将label_dict 应用于arr1 和arr2 组合？
如何改进我的代码建议？
如何使代码适用于 > 2 个数组？

为了完整起见，我的目标是在Arcgis Pro 中重新创建'combine' 函数的功能。

【问题讨论】：

标签： python arrays numpy

【解决方案1】：

另一种方法是根据数组值的唯一元组组合创建字典查找表。

# start with flattened arrays
arr1 = np.random.randint(low=0, high=4, size=25)
arr2 = np.random.randint(low=0, high=4, size=25)

# create tuples and store the unique tuples
combis = list(zip(arr1, arr2)) 

u_combis = set(combis) # get unique combinations

# create a dictionary of the unique tuples with the unique values
u_combi_dict = {combi:n for n, combi in enumerate(u_combis)}

# use the unique dictionary combinations to match the tuples
combi_arr = np.array([u_combi_dict[combi] for combi in combis])

# if needed, reshape back to original extent for spatial analysis
combi_arr_grid = combi_arr.reshape(5, 5)

可以使用任意数量的输入数组的通用函数可以如下工作：

def combine(input_arrays):

    combis = list(zip(*input_arrays))
    u_combis = set(combis)

    u_combi_dict = {combi: n for n, combi in enumerate(u_combis)}
    combi_arr = np.array([u_combi_dict[combi] for combi in combis])

    return combi_arr

【讨论】：

这个答案是稀有的宝石！

【解决方案2】：

如果您的数字很小，例如np.uint8（例如，像无监督分类中的标签），您可以将这些层一起移位和 OR 成一个 64 位的整数并与之组合 - 这将允许您组合多达 8 个 np.uint8 层或 4 个np.int16 层，例如。

#!/usr/bin/env python3

import numpy as np

# Ensure repeatable, deterministic randomness!
np.random.seed(42)

# Generate test arrays
arr2 = np.array(np.random.randint(low=0, high=4, size=25)).reshape(5,5)
arr1 = np.array(np.random.randint(low=0, high=4, size=25)).reshape(5,5)

# Build a FatThing by shifting and ORing arrays together, do 3 arrays with FatThing = arr1 | (arr2<<8) | (arr3(<<16)
FatThing = arr1 | (arr2<<8)

# Find unique values in FatThing
uniques = np.unique(FatThing)

# Make lookup table of labels corresponding to each fat value
FatThing2label = {uniques[i]:i for i in range(len(uniques))}

# Lookup label of each fat value
result = [FatThing2label[int(x)] for x in np.nditer(FatThing)]
result = np.array(result).reshape(arr1.shape)

生成arr1 为：

array([[1, 1, 1, 3, 3],
       [0, 0, 3, 1, 1],
       [0, 3, 0, 0, 2],
       [2, 2, 1, 3, 3],
       [3, 3, 2, 1, 1]])

而arr2 为：

array([[2, 3, 0, 2, 2],
       [3, 0, 0, 2, 1],
       [2, 2, 2, 2, 3],
       [0, 3, 3, 3, 2],
       [1, 0, 1, 3, 3]])

这使得FatThing 看起来像这样：

array([[513, 769,   1, 515, 515],
       [768,   0,   3, 513, 257],
       [512, 515, 512, 512, 770],
       [  2, 770, 769, 771, 515],
       [259,   3, 258, 769, 769]])

result 是这样的：

array([[ 8, 11,  1,  9,  9],
       [10,  0,  3,  8,  4],
       [ 7,  9,  7,  7, 12],
       [ 2, 12, 11, 13,  9],
       [ 6,  3,  5, 11, 11]])

【讨论】：