使用 Pandas 从导入的 csv 计算坐标之间的距离答案

【问题标题】：Using Pandas to calculate distance between coordinates from imported csv使用 Pandas 从导入的 csv 计算坐标之间的距离
【发布时间】：2016-01-06 21:05:27
【问题描述】：

我正在尝试导入包含两列位置数据（纬度/经度）的 .csv，计算点之间的距离，将距离写入新列，将函数循环到下一组坐标，然后写入输出数据帧到一个新的 .csv。我写了以下代码，它

import pandas as pd
import numpy as np
pd.read_csv("input.csv")

def dist_from_coordinates(lat1, lon1, lat2, lon2):
R = 6371  # Earth radius in km

#conversion to radians
d_lat = np.radians(lat2-lat1)
d_lon = np.radians(lon2-lon1)

r_lat1 = np.radians(lat1)
r_lat2 = np.radians(lat2)

#haversine formula
a = np.sin(d_lat/2.) **2 + np.cos(r_lat1) * np.cos(r_lat2) * np.sin(d_lon/2.)**2

haversine = 2 * R * np.arcsin(np.sqrt(a))

return haversine

lat1 = row['lat1'] #first row of location.lat column here
lon1 = row['lon1'] #first row of location.long column here
lat2 = row['lat2'] #second row of location.lat column here
lon2 = row['lon2'] #second row of location.long column here

print(dist_from_coordinates(lat1, lon1, lat2, lon2), 'km')

df.to_csv('output.csv')

我收到以下错误：回溯（最近一次通话最后一次）：文件“Test.py”，第 22 行，在 lat1 = row['lat1'] #这里是location.lat列的第一行 NameError：名称“行”未定义

能否就如何成功地循环这个公式通过这些数据提供额外的反馈？

【问题讨论】：

请将完整的错误回溯添加到您的问题中。
忘记我之前的评论。您是否尝试过打印 line 并查看它实际包含的内容？它似乎是一个包含超过您假设的 3 个字段的列表。

标签： python csv numpy pandas

【解决方案1】：

我假设您在 input.csv 中使用了 4 列，其中包含 lat1、lon1、lat2 和 lon2 的值。因此，经过操作后，output.csv 文件是一个单独的文件，其中包含所有前 4 列以及第 5 列，即距离。您可以使用 for 循环来执行此操作。我在这里展示的方法读取每一行并计算距离并将其附加到一个空列表中，该列表是新列“距离”并最终创建 output.csv。在任何必要的地方进行更改。 请记住，这适用于具有多个坐标值的 4 列 csv 文件。希望这对您有所帮助。祝你有美好的一天。

import pandas as pd
import numpy as np
input_file = "input.csv"
output_file = "output.csv"
df = pd.read_csv(input_file)                       #Dataframe specification
df = df.convert_objects(convert_numeric = True)

def dist_from_coordinates(lat1, lon1, lat2, lon2):
  R = 6371  # Earth radius in km

  #conversion to radians
  d_lat = np.radians(lat2-lat1)
  d_lon = np.radians(lon2-lon1)

  r_lat1 = np.radians(lat1)
  r_lat2 = np.radians(lat2)

  #haversine formula
  a = np.sin(d_lat/2.) **2 + np.cos(r_lat1) * np.cos(r_lat2) * np.sin(d_lon/2.)**2

  haversine = 2 * R * np.arcsin(np.sqrt(a))

  return haversine

new_column = []                    #empty column for distance
for index,row in df.iterrows():
  lat1 = row['lat1'] #first row of location.lat column here
  lon1 = row['lon1'] #first row of location.long column here
  lat2 = row['lat2'] #second row of location.lat column here
  lon2 = row['lon2'] #second row of location.long column here
  value = dist_from_coordinates(lat1, lon1, lat2, lon2)  #get the distance
  new_column.append(value)   #append the empty list with distance values

df.insert(4,"Distance",new_column)  #4 is the index where you want to place your column. Column index starts with 0. "Distance" is the header and new_column are the values in the column.

with open(output_file,'ab') as f:
  df.to_csv(f,index = False)       #creates the output.csv

【讨论】：