【发布时间】:2020-11-27 17:39:49
【问题描述】:
我是数据框中的一个位置,位于 lat lon 列名下方。我想在单独的数据框中显示距离最近火车站的纬度有多远。
例如,我的纬度为 (37.814563 144.970267),并且我有如下其他地理空间点的列表。我想找到最近的点,然后找到这些点之间的距离,作为郊区数据框中的额外列。
这是训练数据集的示例
<bound method NDFrame.to_clipboard of STOP_ID STOP_NAME LATITUDE \
0 19970 Royal Park Railway Station (Parkville) -37.781193
1 19971 Flemington Bridge Railway Station (North Melbo... -37.788140
2 19972 Macaulay Railway Station (North Melbourne) -37.794267
3 19973 North Melbourne Railway Station (West Melbourne) -37.807419
4 19974 Clifton Hill Railway Station (Clifton Hill) -37.788657
LONGITUDE TICKETZONE ROUTEUSSP \
0 144.952301 1 Upfield
1 144.939323 1 Upfield
2 144.936166 1 Upfield
3 144.942570 1 Flemington,Sunbury,Upfield,Werribee,Williamsto...
4 144.995417 1 Mernda,Hurstbridge
geometry
0 POINT (144.95230 -37.78119)
1 POINT (144.93932 -37.78814)
2 POINT (144.93617 -37.79427)
3 POINT (144.94257 -37.80742)
4 POINT (144.99542 -37.78866) >
这是郊区的一个例子
<bound method NDFrame.to_clipboard of postcode suburb state lat lon
4901 3000 MELBOURNE VIC -37.814563 144.970267
4902 3002 EAST MELBOURNE VIC -37.816640 144.987811
4903 3003 WEST MELBOURNE VIC -37.806255 144.941123
4904 3005 WORLD TRADE CENTRE VIC -37.822262 144.954856
4905 3006 SOUTHBANK VIC -37.823258 144.965926>
我想在郊区列表的新列中显示从 lat lon 到 closet 火车站的距离。
使用解决方案得到一个奇怪的输出,想知道它是否正确?
显示两种解决方案,
from sklearn.neighbors import NearestNeighbors
from haversine import haversine
NN = NearestNeighbors(n_neighbors=1, metric='haversine')
NN.fit(trains_shape[['LATITUDE', 'LONGITUDE']])
indices = NN.kneighbors(df_complete[['lat', 'lon']])[1]
indices = [index[0] for index in indices]
distances = NN.kneighbors(df_complete[['lat', 'lon']])[0]
df_complete['closest_station'] = trains_shape.iloc[indices]['STOP_NAME'].reset_index(drop=True)
df_complete['closest_station_distances'] = distances
print(df_complete)
这里的输出,
<bound method NDFrame.to_clipboard of postcode suburb state lat lon Venues Cluster \
1 3040 aberfeldie VIC -37.756690 144.896259 4.0
2 3042 airport west VIC -37.711698 144.887037 1.0
4 3206 albert park VIC -37.840705 144.955710 0.0
5 3020 albion VIC -37.775954 144.819395 2.0
6 3078 alphington VIC -37.780767 145.031160 4.0
#1 #2 #3 \
1 Café Electronics Store Grocery Store
2 Fast Food Restaurant Café Supermarket
4 Café Pub Coffee Shop
5 Café Fast Food Restaurant Grocery Store
6 Café Park Bar
#4 ... #6 \
1 Coffee Shop ... Bakery
2 Grocery Store ... Italian Restaurant
4 Breakfast Spot ... Burger Joint
5 Vietnamese Restaurant ... Pub
6 Pizza Place ... Vegetarian / Vegan Restaurant
#7 #8 #9 \
1 Shopping Mall Japanese Restaurant Indian Restaurant
2 Portuguese Restaurant Electronics Store Middle Eastern Restaurant
4 Bar Bakery Gastropub
5 Chinese Restaurant Gym Bakery
6 Italian Restaurant Gastropub Bakery
#10 Ancestry Cluster ClosestStopId \
1 Greek Restaurant 8.0 20037
2 Convenience Store 5.0 20032
4 Beach 6.0 22180
5 Convenience Store 5.0 20004
6 Coffee Shop 5.0 19931
ClosestStopName \
1 Essendon Railway Station (Essendon)
2 Glenroy Railway Station (Glenroy)
4 Southern Cross Railway Station (Melbourne City)
5 Albion Railway Station (Sunshine North)
6 Alphington Railway Station (Alphington)
closest_station closest_station_distances
1 Glenroy Railway Station (Glenroy) 0.019918
2 Southern Cross Railway Station (Melbourne City) 0.031020
4 Alphington Railway Station (Alphington) 0.023165
5 Altona Railway Station (Altona) 0.005559
6 Newport Railway Station (Newport) 0.002375
还有第二个功能。
def ClosestStop(r):
# Cartesin Distance: square root of (x2-x2)^2 + (y2-y1)^2
distances = ((r['lat']-StationDf['LATITUDE'])**2 + (r['lon']-StationDf['LONGITUDE'])**2)**0.5
# Stop with minimum Distance from the Suburb
closestStationId = distances[distances == distances.min()].index.to_list()[0]
return StationDf.loc[closestStationId, ['STOP_ID', 'STOP_NAME']]
df_complete[['ClosestStopId', 'ClosestStopName']] = df_complete.apply(ClosestStop, axis=1)
这很奇怪地给出了不同的答案,让我认为这段代码有问题。 KM 似乎也是错误的。
完全不确定如何解决这个问题 - 希望得到一些指导,谢谢!
【问题讨论】:
-
您需要 1. 一个函数
distance(lat1, lon1, lat2, lon2), 2. 适用于郊区和车站的每个组合, 3. 获取每个郊区最短距离的车站并添加到数据框中。 (或者使用 sklearn 的 NearestNeighbor 分类器) -
在第一个解决方案中,您在 NN 中使用“haversine”作为距离函数,它是 sklearn 中内置的 hasrsine 距离,以半径表示。您可以在我的回答中看到该文档的链接。要获得以 km 表示的半正弦距离,请使用导入的半正弦包作为 NN 中的距离。我的回答中也表达了这一点。
-
你能分享你想计算距离的城市和车站的数量吗?我这里还没有可扩展的 BallTree 算法示例,当数字扩大时,这是你需要的。