【发布时间】:2020-11-18 00:04:05
【问题描述】:
我有一个包含土耳其省份值的数据框:
df['province']
2078982 Adana
2078983 Adana
2078984 Adana
2078985 Adana
2078986 Adana
2210113 Zonguldak
2210114 Zonguldak
2210115 Zonguldak
2210116 Zonguldak
2210117 Zonguldak
我想编写一个 if 循环或一个函数来创建一个新列,该列将按地区对每个省份进行分类。因此,我创建了 7 个列表,其中包含 7 个区域中每个区域所包含的省份:
aegean = ['Izmir', 'Aydin', 'Manisa', 'Uşak', 'Afyonkarahisar', 'Denizli', 'Kütahya', 'Muğla']
blacksea = ['Amasya', 'Gümüşhane', 'Bartın', 'Bolu', 'Giresun', 'Kastamonu', 'Karabük','Ordu', 'Rize', 'Samsun',
'Sinop', 'Tokat', 'Trabzon', 'Zonguldak', 'Artvin', 'Bayburt', 'Çorum', 'Düzce']
cen_ana= ['Aksaray', 'Kırıkkale', 'Kırşehir', 'Nevşehir', 'Ankara', 'Çankırı', 'Eskisehir', 'Karaman', 'Kayseri', 'Konya', 'Sivas', 'Yozgat']
eas_ana= ['Ağrı', 'Bingöl', 'Elazığ', 'Hakkari', 'Iğdır', 'Kars', 'Tunceli', 'Van', 'Ardahan', 'Erzurum','Şırnak']
marmara=['Edirne', 'Istanbul', 'Kırklareli', 'Kocaeli', 'Tekirdağ', 'Yalova', 'Balıkesir', 'Bilecik', ' Bursa','Çanakkale','Sakarya' ]
medite=['Adana', 'Antalya', 'Mersin', 'Burdur', 'Hatay', 'Isparta', 'Osmaniye','Kahramanmaraş' ]
sou_ana=['Adiyaman', 'Batman','Diyarbakır', 'Gaziantep', 'Siirt', 'Mardin', 'Şanlıurfa']
完成后,我使用 for 和 if 循环遍历数据集:
for i, row in df.iterrows():
df['Region']='something'
if any(e in df["province"] for e in aegean):
df['Region']=="Aegean Region"
elif any(q in df["province"] for q in blacksea):
df['Region']=="Black Sea Region"
elif any(s in df["province"] for s in cen_ana):
df['Region']=="Central Anatolia"
elif any(c in df["province"] for c in eas_ana):
df['Region']=="Eastern Anatolia"
elif any(v in df["province"] for v in sou_ana):
df['Region']=="Southern Anatolia"
elif any(g in df["province"] for g in marmara):
df['Region']=="Marmara"
elif any(h in df["province"] for h in medite):
df['Region']=="Mediterranean"
else:
df['Region']=="Other"
但由于某种原因,我最终得到的只是我的所有列的值“某物”。
df['Region']
Out[148]:
2078982 something
2078983 something
2078984 something
2078985 something
2078986 something
2210113 something
2210114 something
2210115 something
2210116 something
2210117 something
Name: Region, Length: 15901, dtype: object
我尝试了一些建议使用函数的示例:
def regionaler(x):
if any(e in df["province"] for e in aegean):
return "Aegean Region"
elif any(e in df["province"] for e in blacksea):
return "Black Sea Region"
elif any(e in df["province"] for e in cen_ana):
return "Central Anatolia"
elif any(e in df["province"] for e in eas_ana):
return "Eastern Anatolia"
elif any(e in df["province"] for e in sou_ana):
return "Southern Anatolia"
elif any(e in df["province"] for e in marmara):
return "Marmara"
elif any(e in df["province"] for e in medite):
return "Mediterranean"
else:
return "Other"
但结果对我来说同样不正确:
df['Region'] = df.apply(regionaler,axis=1)
df['Region']
Out[151]:
2078982 Other
2078983 Other
2078984 Other
2078985 Other
2078986 Other
2210113 Other
2210114 Other
2210115 Other
2210116 Other
2210117 Other
Name: Region, Length: 15901, dtype: object
我有一种感觉,我正在犯一些非常愚蠢的错误,这些错误可以很容易地修复但无法解决。非常感谢任何可以提供帮助的人!
【问题讨论】:
标签: python python-3.x pandas string dataframe