这可以用scipy.spatial.distance.pdist巧妙地解决。
首先,让我们创建一个在 3D 空间中存储点的示例数组:
import numpy as np
N = 10 # The number of points
points = np.random.rand(N, 3)
print(points)
输出:
array([[ 0.23087546, 0.56051787, 0.52412935],
[ 0.42379506, 0.19105237, 0.51566572],
[ 0.21961949, 0.14250733, 0.61098618],
[ 0.18798019, 0.39126363, 0.44501143],
[ 0.24576538, 0.08229354, 0.73466956],
[ 0.26736447, 0.78367342, 0.91844028],
[ 0.76650234, 0.40901879, 0.61249828],
[ 0.68905082, 0.45289896, 0.69096152],
[ 0.8358694 , 0.61297944, 0.51879837],
[ 0.80963247, 0.1680279 , 0.87744732]])
我们计算每个点到所有其他点的距离:
from scipy.spatial import distance
D = distance.squareform(distance.pdist(points))
print(np.round(D, 1)) # Rounding to fit the array on screen
输出:
array([[ 0. , 0.4, 0.4, 0.2, 0.5, 0.5, 0.6, 0.5, 0.6, 0.8],
[ 0.4, 0. , 0.2, 0.3, 0.3, 0.7, 0.4, 0.4, 0.6, 0.5],
[ 0.4, 0.2, 0. , 0.3, 0.1, 0.7, 0.6, 0.6, 0.8, 0.6],
[ 0.2, 0.3, 0.3, 0. , 0.4, 0.6, 0.6, 0.6, 0.7, 0.8],
[ 0.5, 0.3, 0.1, 0.4, 0. , 0.7, 0.6, 0.6, 0.8, 0.6],
[ 0.5, 0.7, 0.7, 0.6, 0.7, 0. , 0.7, 0.6, 0.7, 0.8],
[ 0.6, 0.4, 0.6, 0.6, 0.6, 0.7, 0. , 0.1, 0.2, 0.4],
[ 0.5, 0.4, 0.6, 0.6, 0.6, 0.6, 0.1, 0. , 0.3, 0.4],
[ 0.6, 0.6, 0.8, 0.7, 0.8, 0.7, 0.2, 0.3, 0. , 0.6],
[ 0.8, 0.5, 0.6, 0.8, 0.6, 0.8, 0.4, 0.4, 0.6, 0. ]])
你读这个距离矩阵是这样的:点 1 和 5 之间的距离是distance[0, 4]。还可以看到每个点与自身的距离为0,例如distance[6, 6] == 0
我们argsort距离矩阵的每一行,为每个点获取最近的点列表:
closest = np.argsort(D, axis=1)
print(closest)
输出:
[[0 3 1 2 5 7 4 6 8 9]
[1 2 4 3 7 0 6 9 8 5]
[2 4 1 3 0 7 6 9 5 8]
[3 0 2 1 4 7 6 5 8 9]
[4 2 1 3 0 7 9 6 5 8]
[5 0 7 3 6 2 8 4 1 9]
[6 7 8 9 1 0 3 2 4 5]
[7 6 8 9 1 0 3 2 4 5]
[8 6 7 9 1 0 3 5 2 4]
[9 6 7 1 8 4 2 0 3 5]]
再次,我们看到每个点都离自己最近。所以,不管这个,我们现在可以选择 k 个最近的点:
k = 3 # For each point, find the 3 closest points
print(closest[:, 1:k+1])
输出:
[[3 1 2]
[2 4 3]
[4 1 3]
[0 2 1]
[2 1 3]
[0 7 3]
[7 8 9]
[6 8 9]
[6 7 9]
[6 7 1]]
例如,我们看到对于点 4,k=3 最接近的点是 1、3 和 2。