【问题标题】:Python Merge Based on Multiple Columns基于多列的 Python 合并
【发布时间】:2020-10-29 03:33:49
【问题描述】:

我希望合并到两个数据帧 Df1 和 Df2

import pandas as pd

    Df1 = pd.DataFrame({
        'name' : ['jack', None, None],
        'Surname' : ['Peterson', 'Macleans', None],
        'city' : ['Sydney', 'Delhi', 'New york']})

,

Df2 = pd.DataFrame({
        'name' : ['jack', 'Riti', 'Aadi','Jeff'],
        'Surname' : ['Peterson', 'Macleans', 'McDonald','Cooper'],
        'city' : ['Sydney', 'Delhi', 'New york','Tokyo'],
        'Rating' : ['AAA', 'AA', 'A','BBB']})

我希望 Pandas 先根据第一列合并,如果匹配失败,则根据第二列合并,如果失败则根据第三列合并。

我用过

new_df = pd.DataFrame([])
new_df = pd.merge(Df1, Df2,  how='left', left_on=['name','Surname','city'], right_on = ['name','Surname','city'])

但这并没有生成我想要的数据帧

Final_Df = pd.DataFrame({
        'name' : ['jack', None, None],
        'Surname' : ['Peterson', 'Macleans', None],
        'city' : ['Sydney', 'Delhi', 'New york'],
        'Rating' : ['AAA', 'AA', 'A']})

编辑 1: 感谢“Quang Hoang”提供答案!

让我们尝试一个for 循环:

Df1['Rating']=np.nan

for col in Df1.columns[:-1]:
    Df1['Rating'] = Df1['Rating'].fillna(Df1[col].map(Df2.set_index(col)['Rating']))

输出:

   name   Surname      city Rating
0  jack  Peterson    Sydney    AAA
1  None  Macleans     Delhi     AA
2  None      None  New york      A

编辑 2: 如果 Df1 中有额外的列,而 Df2 中没有,则正确的代码如下所示:

import numpy as np

Df1['Rating']=np.nan

for col in ['name', 'Surname','city']:
    Df1['Rating'] = Df1['Rating'].fillna(Df1[col].map(Df2.set_index(col)['Rating']))

编辑 3: 如果 Df1 列中存在重复项,则以下代码确实有效。

import numpy as np

Df1['Rating']=np.nan

for col in ['name', 'Surname','city']:
    Df1['Rating'] = Df1['Rating'].fillna(Df1[col].map(Df2.drop_duplicates(col).set_index(col)['Rating']))

【问题讨论】:

    标签: python pandas dataframe merge


    【解决方案1】:

    让我们尝试一个for 循环:

    Df1['Rating']=np.nan
    
    for col in Df1.columns[:-1]:
        Df1['Rating'] = Df1['Rating'].fillna(Df1[col].map(Df2.set_index(col)['Rating']))
    

    输出:

       name   Surname      city Rating
    0  jack  Peterson    Sydney    AAA
    1  None  Macleans     Delhi     AA
    2  None      None  New york      A
    

    【讨论】:

    • 非常感谢!你救了我的一天!只是一个简单的问题,如果说 Df1 有一列在 Df2 中不可用,例如职业,那你怎么能让这段代码工作呢? Df1 = pd.DataFrame({ 'name' : ['jack', None, None], 'Surname' : ['Peterson', 'Macleans', None], 'city' : ['Sydney', 'Delhi', 'New york'], 'Occupation':['Teacher','Student', 'Professor']})
    • for col in list_col_to_map:...?
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2023-02-14
    • 2020-03-10
    • 1970-01-01
    • 1970-01-01
    • 2020-08-02
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多