如何使用 python pandas 在特定索引中添加新列答案

【问题标题】：How can I use python pandas add new columns in specific index如何使用 python pandas 在特定索引中添加新列
【发布时间】：2015-03-26 02:27:38
【问题描述】：

我想使用 Google API 在 CSV 文件中获取“位置”的纬度和经度，我可以通过 Google API 模块获取 'lat' 、 'lng' 。但我无法将文件保存回原始文件并在“位置”后面插入

我的原始文件如下所示：

date            time     location  birdName count birdName count birdName  count                     
1990-02-10   0900:1200   balabala    bird1    15    bird2    10    bird3    20                   
1990-02-28   1300:1500   balabala    bird4    40    bird5    10    bird6    25       
1990-03-01   0900-1200   balabala    bird7    45    bird8    15    bird9    30                       
  ...          ...         ...        ...    ...     ...     ...    ...    ...

我想在 'location' 之后插入 'lat' 和 'lng' 列，如下所示：

date            time     location   lat   lng  birdName count birdName count birdName  count                     
1990-02-10   0900:1200   balabala   xxx   xxx   bird1    15    bird2    10    bird3    20                   
1990-02-28   1300:1500   balabala   xxx   xxx   bird4    40    bird5    10    bird6    25       
1990-03-01   0900-1200   balabala   xxx   xxx   bird7    45    bird8    15    bird9    30                       
  ...          ...         ...      ...   ...    ...     ...     ...     ...    ...    ...

Google API 模块：https://drive.google.com/open?id=0B6SUWnrBmDwSb3BabFdEcXV3LUU&authuser=0

我的代码：

# -*- coding: utf-8 -*-
import pandas as pd
from geocodequery import GeocodeQuery

def addrs(location):
  for addrs in location:
    addr= addrs
    gq = GeocodeQuery("zh-tw", "tw")
    gq.get_geocode(addr)
    lng=gq.get_lng()
    lat=gq.get_lat()
    df['lat']=lat
    df['lng']=lng         
    df.to_csv('./birdsIwant.csv')   


 df = pd.read_csv('./birdsIwant.csv',low_memory=False)
 addrs(df['location'])

我该怎么办？

【问题讨论】：

标签： python google-maps csv pandas

【解决方案1】：

您可以使用精美的索引来更改列顺序：

In [179]:
# add the columns
df['lat'] = np.random.randn(len(df))
df['lng'] = np.random.randn(len(df))
df
Out[179]:
         date       time  location birdName  count birdName.1  count.1  \
0  1990-02-10  0900:1200  balabala    bird1     15      bird2       10   
1  1990-02-28  1300:1500  balabala    bird4     40      bird5       10   
2  1990-03-01  0900-1200  balabala    bird7     45      bird8       15   

  birdName.2  count.2       lat       lng  
0      bird3       20 -0.915371 -1.508814  
1      bird6       25 -0.716439  1.008078  
2      bird9       30  0.609510 -1.185927  
In [185]:
# get a list of the columns
col_list = list(df)
# insert column names at new positions
col_list.insert(3,'lat')
col_list.insert(4,'lng')
# slice off the last 2 values
col_list=col_list[:-2]
print(col_list)

['date', 'time', 'location', 'lat', 'lng', 'birdName', 'count', 'birdName.1', 'count.1', 'birdName.2', 'count.2']
In [187]:
# use ix and pass the new column order to sort the order
df = df.ix[:,col_list]
df
Out[187]:
         date       time  location       lat       lng birdName  count  \
0  1990-02-10  0900:1200  balabala -0.915371 -1.508814    bird1     15   
1  1990-02-28  1300:1500  balabala -0.716439  1.008078    bird4     40   
2  1990-03-01  0900-1200  balabala  0.609510 -1.185927    bird7     45   

  birdName.1  count.1 birdName.2  count.2  
0      bird2       10      bird3       20  
1      bird5       10      bird6       25  
2      bird8       15      bird9       30

编辑

您的代码在每次迭代时都会写入 csv，因此即使它确实设置了您在每次迭代中覆盖的正确 lat 和 lng 值，您也应该在函数外部写入 csv。无论如何，以下内容更清晰，应该可以工作：

def addrs(location):
    gq = GeocodeQuery("zh-tw", "tw")
    gq.get_geocode(location)
    return pd.Series([gq.get_lat(), gq.get_lng()])

df[['lat','lng']] = df['location'].apply(addrs)
df.to_csv('./birdsIwant.csv')

【讨论】：

谢谢，有一点问题。保存 csv 文件后，数据“lat”和“lng”始终是最后一个“位置”，我该如何解决这个问题？请再次帮助我。
您的代码有错误，因为您实际上是在为您的 df 分配每次迭代的最后一个值
EdChum，很抱歉再次打扰您。你给的代码会有一个新问题。即“引发 URLError(err) URLError: 我更改“df[['lat','lng']] = df['location' ].apply(addrs)" to "df[['lat','lng']] = df['location'][1;300].apply(addrs)"，它有效。但我只想这样做一次，有什么方法可以解决？
对不起，您必须研究该 api 是否存在查询限制，我不知道该 api，如果有限制，则很难规避，您可以迭代在你的 df 块上并处理它们，顺便说一下索引值是从零开始的，所以 [1:300] 会错过第一行