按欧几里得距离对坐标数组进行排序答案

【问题标题】：Sorting array of coordinates by their Euclidean distance按欧几里得距离对坐标数组进行排序
【发布时间】：2021-05-18 08:43:42
【问题描述】：

我有一个长度为 x 的数组 A 和一个长度为 y 的数组，其中 y > x。

例如：

A = [[14,44],[16,47],[27,79]]
B = [[14,46],[16,46],[18,89],[27,79],[45,127]]

我想要的输出是这样排序的数组：

B = [[14,46],[16,46],[27,79],[15,89],[45,127]]

我想对数组 B 进行排序，以使 A 和 B 向量的坐标的最低欧几里德距离对齐（有一个阈值，将不接近 A 向量的 B 向量放在末尾矢量 B)。

这是我的代码：

def align_by_dist(A, B):
    for i in range(len(B)):
        D = []
        for j in range(len(A)):
            dist = distance.euclidean(A[j], B[i])
            if dist < 3: # Select a treshold for the euclidean distance
                D.append(dist) # Append the euclidean distance which is lower than the threshold
                if dist == min(D): # Check if it is the lower euclidean distance 
                    B[j], B[i] = B[i], B[j]
                # If it is the lowest euclidean distance, put at the same level in the array              
return A, B

我的问题是，当我有像我的示例中那样接近的向量时，欧几里得距离低但可能不是最低的第一个向量会对 B 数组进行排序。

这是我的代码产生的结果：

B = [[16,46],[14,46],[27,79],[15,89],[45,127]]

第一个向量和第二个向量应该倒置。

【问题讨论】：

标签： python arrays sorting distance

【解决方案1】：

我可能已经成功修改了我的代码：

def align_by_dist(A, B):
    for i in range(len(A)):
        D = [] # This list will contain the index where the euclidean distance is lower than the threshold
        for j in range(len(B)):
            dist = distance.euclidean(A[i], B[j]) # Compute the euclidean distance between a target at index j and a prediction at index I
            if dist <= 4: # Select a treshold for the euclidean distance
                D.append(np.array([dist,j,i])) # Append the euclidean distance and the index of the target and prediction vector
        if D: # If we find an euclidean distance lower than the threshold we can now sort for the index i the list of prediction
            D_sorted = sorted(D,key=lambda elem:elem[0]) # We sort the array of distance lower than the threshold to get the minimum distance for the index I 
            value = D_sorted[0]
            B[value[2].astype(np.int64)], B[value[1].astype(np.int64)] = B[value[1].astype(np.int64)], B[value[2].astype(np.int64)] # We updated the target list position 
            A[value[2].astype(np.int64)] = [1000000,1000000] # We set the value of the predictions very high to not have duplicates when sorting the targets coordinates 
return B

【讨论】：

使用输入数组运行代码会出现重复点：[[14.0, 46.0], [16.0, 46.0], [27.0, 79.0], [27.0, 79.0], [45.0, 127.0]]

【解决方案2】：

既然您已经在使用 numpy，让我们尝试提出一个矢量化解决方案。两点之间的平方欧几里得距离由d^2 = (x2 - x1)^2 + (y2 - y1)^2 给出

A = np.array([[14.0,44],[16,47],[27,79]])
B = np.array([[14.0,46],[16,46],[18,89],[27,79],[45,127]])

A[:, 0, None] 为我们提供A 中所有点的 X 值，作为 (3, 1) 形状数组。
B[:, 0, None].T 为我们提供B 中所有点的 Y 值，作为(1, 5) 形状数组。

Numpy 可以广播这些形状，因此A[:, 0, None] - B[:, 0, None].T 给我们一个(3, 5) 形状数组，其中i, j 元素是A[i, 0] - B[0, j]。元素平方，它给了我们平方欧几里得距离公式的第一项。对A 和B 的1 列（而不是第0 列）做同样的事情，我们得到了第二个术语。

dist_sqr = (A[:, 0, None] - B[:, 0, None].T)**2 + (A[:, 1, None] - B[:, 1, None].T)**2

现在，dist_sqr[i, j] 为您提供A[i, :] 和B[j, :] 点之间的距离。

对于A中的每个点（对于每一行），包含最小距离的列表示距离B最近的点。

为了选择距离最小的列索引，我们使用np.argmin()和axis=1。

min_dist_pt = np.argmin(dist_sqr, axis=1)

这给了我们一个三元素向量array([0, 1, 3], dtype=int64)。

reordered_B = B[min_dist_pt, :]

# array([[14., 46.],
#       [16., 46.],
#       [27., 79.]])

这是我们想要的顺序。现在，我们需要填写B中剩余的点。您似乎没有顺序，所以我将按照它们在B中出现的顺序填写它们。为此，我将range(num_pts) 和min_dist_pt 转换为集合，然后取集合差异。

num_pts = B.shape[0]
remaining_indices = list(set(range(num_pts)) - set(min_dist_pt))
remaining_B = B[remaining_indices, :]

# array([[ 18.,  89.],
#       [ 45., 127.]])

最后，我们堆叠reordered_B 和remaining_B 数组：

np.vstack((reordered_B, remaining_B))

# array([[ 14.,  46.],
#       [ 16.,  46.],
#       [ 27.,  79.],
#       [ 18.,  89.],
#       [ 45., 127.]])

一起作为一个单一的功能：

def align_by_dist_2(A, B):
    A = np.asarray(A, np.float64)
    B = np.asarray(B, np.float64)
    
    dist_sqr = (A[:, 0, None] - B[:, 0, None].T)**2 + (A[:, 1, None] - B[:, 1, None].T)**2
    
    min_dist_pt = np.argmin(dist_sqr, axis=1)
    
    reordered_B = B[min_dist_pt, :]
    
    num_pts = B.shape[0]
    remaining_indices = list(set(range(num_pts)) - set(min_dist_pt))
    remaining_B = B[remaining_indices, :]
    
    return np.vstack((reordered_B, remaining_B))

在您的循环方法 (align_by_dist) 和我的矢量化方法 (align_by_dist_2) 之间进行一些比较：

A = np.array([[14.0,44],[16,47],[27,79]])
B = np.array([[14.0,46],[16,46],[18,89],[27,79],[45,127]])

%timeit align_by_dist(A, B)
210 µs ± 6.96 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit align_by_dist_2(A, B)
45.3 µs ± 6.97 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

使用矢量化方法显示了约 5 倍的加速

使用更大的数组：

A = np.random.random((100, 2))
B = np.random.random((110, 2))

%timeit align_by_dist(A, B)
189 ms ± 13.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit align_by_dist_2(A, B)
227 µs ± 27.9 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

使用更大的阵列，加速比更高：~800 倍！

【讨论】：