【发布时间】:2021-01-21 13:16:51
【问题描述】:
我正在尝试创建一个包含城市名称的新列。我还有一个列表,其中包含所需的城市名称以及在不同列名下具有城市名称的 CSV 文件。
我要做的是检查列表中的城市名称是否存在于 CSV 文件的特定列范围内,并将该特定城市名称填写在新列 City 中。
我的代码是:
import pandas as pd
import numpy as np
City_Name_List = ['Amsterdam', 'Antwerp', 'Brussels', 'Ghent', 'Asheville', 'Austin', 'Boston', 'Broward County',
'Cambridge', 'Chicago', 'Clark County Nv', 'Columbus', 'Denver', 'Hawaii', 'Jersey City', 'Los Angeles',
'Nashville', 'New Orleans', 'New York City', 'Oakland', 'Pacific Grove', 'Portland', 'Rhode Island', 'Salem Or', 'San Diego']
data = {'host_identity_verified':['t','t','t','t','t','t','t','t','t','t'],
'neighbourhood':['Amsterdam, North Holland, Netherlands', 'Amsterdam, North Holland, Netherlands', 'NaN',
'Amsterdam, North Holland, Netherlands', 'Amsterdam, North Holland, Netherlands',
'Amsterdam, North Holland, Netherlands', 'Amsterdam, North Holland, Netherlands', 'NaN',
'Amsterdam, North Holland, Netherlands', 'Amsterdam, North Holland, Netherlands'],
'neighbourhood_cleansed':['Oostelijk Havengebied - Indische Buurt', 'Centrum-Oost', 'Centrum-West', 'Centrum-West', 'Centrum-West',
'Oostelijk Havengebied - Indische Buurt', 'Centrum-Oost', 'Centrum-West', 'Centrum-West', 'Centrum-West'],
'neighbourhood_group_cleansed': ['NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN'],
'latitude':[ 52.36575, 52.36509, 52.37297, 52.38761, 52.36719, 52.36575, 52.36509, 52.37297, 52.38761, 52.36719]}
df = pd.DataFrame(data)
df['City'] = [x for x in City_Name_List if x in df.loc[:,'host_identity_verified':'latitude'].values][0]
当我运行代码时,我收到以下消息:
Traceback (most recent call last):
File "C:/Users/YAZAN/PycharmProjects/Yazan_Work/try.py", line 63, in <module>
df['City'] = [x for x in City_Name_List if x in df.loc[:,'host_identity_verified':'latitude'].values][0]
IndexError: list index out of range
这是由于数据中的City Amsterdam后面是其他词。
我希望我的输出如下:
0 Amsterdam
1 Amsterdam
2 Amsterdam
3 Amsterdam
4 Amsterdam
5 Amsterdam
6 Amsterdam
7 Amsterdam
8 Amsterdam
9 Amsterdam
Name: City, dtype: object
我不断尝试解决这个问题。我尝试使用endswith、startswith、正则表达式,但无济于事。我可能错误地使用了这两种方法。我希望有人可以帮助我。
【问题讨论】:
标签: python pandas list dataframe list-comprehension