改编自this 答案。链接的答案显示了如何计算每行之间的距离和经度/纬度的某个固定值 - 我的适应允许它适用于您的情况。
首先,使用shift 在同一行获取您需要的所有值:
df['lon2'] = df['lon'].shift(-1)
df['lat2'] = df['lat'].shift(-1)
给予:
id lat lon lat2 lon2
0 1 NaN NaN 40.121 23.749
1 1 40.121 23.749 -56.154 -39.572
2 1 -56.154 -39.572 21.908 17.537
3 1 21.908 17.537 31.221 -36.186
4 1 31.221 -36.186 -56.655 0.016
5 1 -56.655 0.016 NaN NaN
6 2 NaN NaN -36.438 14.874
7 2 -36.438 14.874 -21.422 81.271
8 2 -21.422 81.271 43.961 -95.551
9 2 43.961 -95.551 NaN NaN
10 3 NaN NaN 79.821 -56.781
11 3 79.821 -56.781 NaN NaN
然后定义计算距离的函数:
from numpy import cos, sin, arcsin, sqrt
from math import radians
def haversine(row):
lon1 = row['lon']
lat1 = row['lat']
lon2 = row['lon2']
lat2 = row['lat2']
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * arcsin(sqrt(a))
km = 6367 * c
return km
并使用apply 将其应用于您的数据:
df['distance'] = df.apply(haversine, axis=1)
给予:
id lat lon lat2 lon2 distance
0 1 NaN NaN 40.121 23.749 NaN
1 1 40.121 23.749 -56.154 -39.572 12237.017692
2 1 -56.154 -39.572 21.908 17.537 10187.684397
3 1 21.908 17.537 31.221 -36.186 5387.540299
4 1 31.221 -36.186 -56.655 0.016 10343.267833
5 1 -56.655 0.016 NaN NaN NaN
6 2 NaN NaN -36.438 14.874 NaN
7 2 -36.438 14.874 -21.422 81.271 6543.302199
8 2 -21.422 81.271 43.961 -95.551 17480.809345
9 2 43.961 -95.551 NaN NaN NaN
10 3 NaN NaN 79.821 -56.781 NaN
11 3 79.821 -56.781 NaN NaN NaN
我相信这显示了您正在寻找的结果(我测试了第一个,它似乎是正确的)。
如果您愿意,可以在计算完成后去掉两个辅助纬度/经度列:
df.drop(['lat2', 'lon2'], axis=1, inplace=True)
我应该注意,此解决方案不会为您提供最快的计算。请参阅我链接的答案的后半部分,以探索如果性能是此处的重中之重,如何改进这一点,尽管需要对其进行调整。