查找特定距离内的所有最近邻居答案

【问题标题】：Find all nearest neighbors within a specific distance查找特定距离内的所有最近邻居
【发布时间】：2015-12-02 04:18:08
【问题描述】：

我有一个很大的 x 和 y 坐标列表，存储在 numpy 数组中。

Coordinates = [[ 60037633 289492298]
 [ 60782468 289401668]
 [ 60057234 289419794]]
...
...

我想要的是找到特定距离（比如说 3 米）内的所有最近邻居并存储结果，以便我以后可以对结果进行进一步分析。

对于我发现的大多数包，有必要确定应该找到多少个 NN，但我只希望所有都在设定的距离内。

我怎样才能实现这样的目标？对于大型数据集（数百万个点）而言，最快和最好的方法是什么？

【问题讨论】：

您是否尝试过自己做这件事？你的代码现在是什么样子的？你能举一个你试图计算的例子吗（即 3 米是什么意思）？这些是 GPS 坐标吗？
from scipy import spatial myTreeName=spatial.cKDTree(Coordinates,leafsize=100) for item in Coordinates: TheResult=myTreeName.query(item,k=20,distance_upper_bound=3) 是我之前尝试过的，但在这里我必须指定我想找到多少最近的邻居。是的，这些是 GPS 坐标（X，Y），我想为数据集中的每个点找到 3 米半径内的所有 NN。

标签： python numpy nearest-neighbor

【解决方案1】：

你可以使用scipy.spatial.cKDTree:

import numpy as np
import scipy.spatial as spatial
points = np.array([(1, 2), (3, 4), (4, 5)])
point_tree = spatial.cKDTree(points)
# This finds the index of all points within distance 1 of [1.5,2.5].
print(point_tree.query_ball_point([1.5, 2.5], 1))
# [0]

# This gives the point in the KDTree which is within 1 unit of [1.5, 2.5]
print(point_tree.data[point_tree.query_ball_point([1.5, 2.5], 1)])
# [[1 2]]

# More than one point is within 3 units of [1.5, 1.6].
print(point_tree.data[point_tree.query_ball_point([1.5, 1.6], 3)])
# [[1 2]
#  [3 4]]

这是一个示例，说明如何一次调用即可找到一组点的所有最近邻居给point_tree.query_ball_point：

import numpy as np
import scipy.spatial as spatial
import matplotlib.pyplot as plt
np.random.seed(2015)

centers = [(1, 2), (3, 4), (4, 5)]
points = np.concatenate([pt+np.random.random((10, 2))*0.5 
                         for pt in centers])
point_tree = spatial.cKDTree(points)

cmap = plt.get_cmap('copper')
colors = cmap(np.linspace(0, 1, len(centers)))
for center, group, color  in zip(centers, point_tree.query_ball_point(centers, 0.5), colors):
   cluster = point_tree.data[group]
   x, y = cluster[:, 0], cluster[:, 1]
   plt.scatter(x, y, c=color, s=200)

plt.show()

【讨论】：

我认为推荐使用spatial.cKDTree。（我相信唯一的区别是实现......行为和界面是相同的。）
感谢@askewchan 的更正。 cKDTree 应该更快。
好的，如果我想查询很多点或点，我如何将找到的最近点与那里的查询点一起存储？所以在你的例子中是这样的：(1.5 : 1 2) (1.6: 3 4) 喜欢有一个索引、字典或元组或类似的东西？
我添加了一个示例，展示如何对点数组执行查询。