【发布时间】:2020-10-29 03:33:49
【问题描述】:
我希望合并到两个数据帧 Df1 和 Df2
import pandas as pd
Df1 = pd.DataFrame({
'name' : ['jack', None, None],
'Surname' : ['Peterson', 'Macleans', None],
'city' : ['Sydney', 'Delhi', 'New york']})
,
Df2 = pd.DataFrame({
'name' : ['jack', 'Riti', 'Aadi','Jeff'],
'Surname' : ['Peterson', 'Macleans', 'McDonald','Cooper'],
'city' : ['Sydney', 'Delhi', 'New york','Tokyo'],
'Rating' : ['AAA', 'AA', 'A','BBB']})
我希望 Pandas 先根据第一列合并,如果匹配失败,则根据第二列合并,如果失败则根据第三列合并。
我用过
new_df = pd.DataFrame([])
new_df = pd.merge(Df1, Df2, how='left', left_on=['name','Surname','city'], right_on = ['name','Surname','city'])
但这并没有生成我想要的数据帧
Final_Df = pd.DataFrame({
'name' : ['jack', None, None],
'Surname' : ['Peterson', 'Macleans', None],
'city' : ['Sydney', 'Delhi', 'New york'],
'Rating' : ['AAA', 'AA', 'A']})
编辑 1: 感谢“Quang Hoang”提供答案!
让我们尝试一个for 循环:
Df1['Rating']=np.nan
for col in Df1.columns[:-1]:
Df1['Rating'] = Df1['Rating'].fillna(Df1[col].map(Df2.set_index(col)['Rating']))
输出:
name Surname city Rating
0 jack Peterson Sydney AAA
1 None Macleans Delhi AA
2 None None New york A
编辑 2: 如果 Df1 中有额外的列,而 Df2 中没有,则正确的代码如下所示:
import numpy as np
Df1['Rating']=np.nan
for col in ['name', 'Surname','city']:
Df1['Rating'] = Df1['Rating'].fillna(Df1[col].map(Df2.set_index(col)['Rating']))
编辑 3: 如果 Df1 列中存在重复项,则以下代码确实有效。
import numpy as np
Df1['Rating']=np.nan
for col in ['name', 'Surname','city']:
Df1['Rating'] = Df1['Rating'].fillna(Df1[col].map(Df2.drop_duplicates(col).set_index(col)['Rating']))
【问题讨论】:
标签: python pandas dataframe merge