如果你完全向量化距离计算,几千个点应该不会花那么长时间:
In [1]:
from numpy import *
In [3]:
def lg_lat_distance(p1,p2): #based on Spherical Law of Cosines
lg1=p1[0] #data format, (latitude, longitude)
la1=p1[1]
lg2=p2[0]
la2=p2[1]
return arccos(sin(la1)*sin(la2)+cos(la1)*cos(la2)*cos(lg1-lg2))*6371 #in km
In [14]:
data=array([(42.385305, -87.963793),
(41.703427, -88.121665),
(41.889764, -87.978553),
(41.995931, -87.787501),
(42.25875, -87.948199)]) #5 elements
data=data/180*pi
In [16]:
dist_matrix=(lg_lat_distance(hstack([data,]*5).reshape(-1,2).T, vstack([data,]*5).T)).reshape(5,5)
print dist_matrix
[[ 9.49352980e-05 1.77442357e+01 2.54929710e+00 1.96682533e+01
1.80515399e+00]
[ 1.77442357e+01 0.00000000e+00 1.59289162e+01 3.71753501e+01
1.94041828e+01]
[ 2.54929710e+00 1.59289162e+01 0.00000000e+00 2.12484793e+01
3.67668607e+00]
[ 1.96682533e+01 3.71753501e+01 2.12484793e+01 0.00000000e+00
1.79018035e+01]
[ 1.80515399e+00 1.94041828e+01 3.67668607e+00 1.79018035e+01
9.49352980e-05]]
In [17]:
%timeit dist_matrix=(lg_lat_distance(hstack([data,]*5).reshape(-1,2).T, vstack([data,]*5).T)).reshape(5,5)
1000 loops, best of 3: 245 µs per loop
我认为你得到了dist_matrix,事情会变得容易。您可以使用布尔索引过滤出成对距离