【问题标题】:Find distance to nearest neighbor in 2d array在二维数组中查找到最近邻居的距离
【发布时间】:2023-12-08 23:29:01
【问题描述】:

我有一个二维数组,我想尽快找到每个(x, y) 点到其最近邻居的距离

我可以使用scipy.spatial.distance.cdist

import numpy as np
from scipy.spatial.distance import cdist

# Random data
data = np.random.uniform(0., 1., (1000, 2))
# Distance between the array and itself
dists = cdist(data, data)
# Sort by distances
dists.sort()
# Select the 1st distance, since the zero distance is always 0.
# (distance of a point with itself)
nn_dist = dists[:, 1]

这可行,但我觉得它的工作量太大,KDTree 应该能够处理这个问题,但我不确定如何处理。我对最近邻居的坐标不感兴趣,我只想要距离(并且尽可能快)。

【问题讨论】:

  • 那么您为什么不继续尝试cKDTree?只需几行代码。
  • 我没想到要尝试cKDTree。我试试看。

标签: python numpy scipy nearest-neighbor euclidean-distance


【解决方案1】:

KDTree 可以做到这一点。该过程与使用 cdist 时几乎相同。但是cdist要快得多。正如 cmets 中所指出的,cKDTree 甚至更快:

import numpy as np
from scipy.spatial.distance import cdist
from scipy.spatial import KDTree
from scipy.spatial import cKDTree
import timeit

# Random data
data = np.random.uniform(0., 1., (1000, 2))

def scipy_method():
    # Distance between the array and itself
    dists = cdist(data, data)
    # Sort by distances
    dists.sort()
    # Select the 1st distance, since the zero distance is always 0.
    # (distance of a point with itself)
    nn_dist = dists[:, 1]
    return nn_dist

def KDTree_method():
    # You have to create the tree to use this method.
    tree = KDTree(data)
    # Then you find the closest two as the first is the point itself
    dists = tree.query(data, 2)
    nn_dist = dists[0][:, 1]
    return nn_dist

def cKDTree_method():
    tree = cKDTree(data)
    dists = tree.query(data, 2)
    nn_dist = dists[0][:, 1]
    return nn_dist

print(timeit.timeit('cKDTree_method()', number=100, globals=globals()))
print(timeit.timeit('scipy_method()', number=100, globals=globals()))
print(timeit.timeit('KDTree_method()', number=100, globals=globals()))

输出:

0.34952507635557595
7.904083715193579
20.765962179145546

再一次证明 C 很棒!

【讨论】:

  • 哇,这在运行时有很大的不同。我只是假设KDTree 会快得多,不知道为什么。谢谢@Akaisteph7!
  • 我听从了 Paul 的建议并尝试使用 cKDTree 而不是 KDTree(只需更改上面代码中的两个字母),它比 @ 快 几个数量级 987654326@.
  • 这可以用来测量二进制文件中最接近 0 值的 (x,y) 距离吗?