【问题标题】:python pandas merge dataframepython pandas合并数据框
【发布时间】:2021-06-25 13:13:20
【问题描述】:

如何按始发地、目的地和承运人合并这两张表?

第一个表有媒体需求列,我需要根据起点+目的地+承运人值将此字段添加到第二个表中。 响应应位于 2 表的新列中。

我尝试过pandas.merge(1st table, 2 table),但没有帮助。

请帮我解答这个问题

1 个表:

{'index': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16, 17: 17, 18: 18, 19: 19}, 'origin': {0: 'NEW YORK', 1: 'NEW YORK', 2: 'NEW YORK', 3: 'NEW YORK', 4: 'NEW YORK', 5: 'NEW YORK', 6: 'NEW YORK', 7: 'NEW YORK', 8: 'NEW YORK', 9: 'NEW YORK', 10: 'NEW YORK', 11: 'NEW YORK', 12: 'NEW YORK', 13: 'NEW YORK', 14: 'NEW YORK', 15: 'NEW YORK', 16: 'NEW YORK', 17: 'NEW YORK', 18: 'NEW YORK', 19: 'NEW YORK'}, 'destination': {0: 'Aqaba', 1: 'Aqaba', 2: 'Batumi', 3: 'Benghazi', 4: 'Benghazi', 5: 'Bremerhaven', 6: 'El Khoms', 7: 'El Khoms', 8: 'El Khoms', 9: 'Jebel Ali', 10: 'Jebel Ali', 11: 'Jebel Ali', 12: 'Klaipeda', 13: 'Klaipeda', 14: 'MISURATA', 15: 'MISURATA', 16: 'MISURATA', 17: 'Novorossiysk', 18: 'Odessa', 19: 'Odessa'}, 'carrier_name': {0: 'HAPAG LLOYD', 1: 'MEDITERRANEAN SHIPPING CORP', 2: 'MEDITERRANEAN SHIPPING CORP', 3: 'CGM', 4: 'MAERSK LINES, INC.', 5: 'CGM', 6: 'CGM', 7: 'HAPAG LLOYD', 8: 'MAERSK LINES, INC.', 9: 'HAPAG LLOYD', 10: 'MAERSK LINES, INC.', 11: 'ONE NETWORK EXPRESS', 12: 'CGM', 13: 'EVERGREEN INTERNATIONAL (U S A)', 14: 'CGM', 15: 'HAPAG LLOYD', 16: 'MAERSK LINES, INC.', 17: 'MEDITERRANEAN SHIPPING CORP', 18: 'CGM', 19: 'Cosco Container Line'}, 'medium need': {0: 20.0, 1: 19.0, 2: 5.0, 3: 30.0, 4: 26.0, 5: 28.0, 6: 15.0, 7: 11.0, 8: 12.0, 9: 15.0, 10: 18.0, 11: 16.0, 12: 16.0, 13: 10.0, 14: 7.0, 15: 6.0, 16: 7.0, 17: 6.0, 18: 42.0, 19: 26.0}}

2 表:

{'index': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16, 17: 17, 18: 18, 19: 19}, 'origin': {0: 'NEW YORK', 1: 'NEW YORK', 2: 'NEW YORK', 3: 'NEW YORK', 4: 'NEW YORK', 5: 'NEW YORK', 6: 'NEW YORK', 7: 'NEW YORK', 8: 'NEW YORK', 9: 'NEW YORK', 10: 'NEW YORK', 11: 'NEW YORK', 12: 'NEW YORK', 13: 'NEW YORK', 14: 'NEW YORK', 15: 'NEW YORK', 16: 'NEW YORK', 17: 'NEW YORK', 18: 'NEW YORK', 19: 'NEW YORK'}, 'destination': {0: 'Aqaba ', 1: 'Aqaba ', 2: 'Aqaba ', 3: 'Aqaba ', 4: 'Aqaba ', 5: 'Aqaba ', 6: 'Aqaba ', 7: 'Aqaba ', 8: 'Aqaba ', 9: 'Aqaba ', 10: 'Aqaba ', 11: 'Aqaba ', 12: 'Aqaba ', 13: 'Aqaba ', 14: 'Aqaba ', 15: 'Aqaba ', 16: 'Aqaba ', 17: 'Aqaba ', 18: 'Aqaba ', 19: 'Aqaba '}, 'from_': {0: '3/22/2021', 1: '3/29/2021', 2: '4/05/2021', 3: '3/29/2021', 4: '4/05/2021', 5: '4/12/2021', 6: '3/22/2021', 7: '3/29/2021', 8: '4/12/2021', 9: '4/05/2021', 10: '4/05/2021', 11: '4/12/2021', 12: '4/12/2021', 13: '4/12/2021', 14: '3/22/2021', 15: '3/22/2021', 16: '3/22/2021', 17: '3/22/2021', 18: '4/12/2021', 19: '3/29/2021'}, 'to_': {0: '3/29/2021', 1: '4/05/2021', 2: '4/12/2021', 3: '4/05/2021', 4: '4/12/2021', 5: '4/19/2021', 6: '3/29/2021', 7: '4/05/2021', 8: '4/19/2021', 9: '4/12/2021', 10: '4/12/2021', 11: '4/19/2021', 12: '4/19/2021', 13: '4/19/2021', 14: '3/29/2021', 15: '3/29/2021', 16: '3/29/2021', 17: '3/29/2021', 18: '4/19/2021', 19: '4/05/2021'}, 'carrier_name': {0: 'MEDITERRANEAN SHIPPING CORP', 1: 'MEDITERRANEAN SHIPPING CORP', 2: 'HAPAG LLOYD', 3: 'MEDITERRANEAN SHIPPING CORP', 4: 'MEDITERRANEAN SHIPPING CORP', 5: 'MEDITERRANEAN SHIPPING CORP', 6: 'HAPAG LLOYD', 7: 'HAPAG LLOYD', 8: 'MEDITERRANEAN SHIPPING CORP', 9: 'MEDITERRANEAN SHIPPING CORP', 10: 'MAERSK LINES, INC.', 11: 'MAERSK LINES, INC.', 12: 'HAPAG LLOYD', 13: 'HAPAG LLOYD', 14: 'MAERSK LINES, INC.', 15: 'MAERSK LINES, INC.', 16: 'MEDITERRANEAN SHIPPING CORP', 17: 'HAPAG LLOYD', 18: 'CGM', 19: 'MEDITERRANEAN SHIPPING CORP'}, 'vessel_name': {0: 'MSC RANIA', 1: 'MSC RANIA', 2: 'CMA CGM IVANHOE', 3: 'SEAMAX BRIDGEPORT', 4: 'SEAMAX BRIDGEPORT', 5: 'NAVARINO', 6: 'EXPRESS ATHENS', 7: 'EXPRESS ATHENS', 8: 'NAVIOS UTMOST', 9: 'NAVIOS UTMOST', 10: 'MAERSK SEBAROK', 11: 'MAERSK COLUMBUS', 12: 'EXPRESS ROME', 13: 'EXPRESS ROME', 14: 'MAERSK ATLANTA', 15: 'MAERSK ATLANTA', 16: 'MSC RANIA', 17: 'OOCL WASHINGTON', 18: 'OOCL EUROPE', 19: 'MSC RANIA'}, 'doc_cut': {0: '3/29/2021', 1: '3/29/2021', 2: '4/9/2021', 3: '4/5/2021', 4: '4/5/2021', 5: '4/19/2021', 6: '3/29/2021', 7: '3/29/2021', 8: '4/12/2021', 9: '4/12/2021', 10: '4/6/2021', 11: '4/13/2021', 12: '4/16/2021', 13: '4/15/2021', 14: '3/26/2021', 15: '3/26/2021', 16: '3/29/2021', 17: '3/23/2021', 18: '4/15/2021', 19: '3/29/2021'}, 'container_type': {0: 'HV', 1: 'HV', 2: 'HV', 3: 'HV', 4: 'HV', 5: 'HV', 6: 'HV', 7: 'HV', 8: 'HV', 9: 'HV', 10: '45', 11: '45', 12: 'HV', 13: 'HV', 14: 'HV', 15: '45', 16: '45', 17: 'HV', 18: '2B', 19: '45'}, 'count': {0: 32, 1: 32, 2: 26, 3: 15, 4: 15, 5: 14, 6: 13, 7: 13, 8: 12, 9: 12, 10: 8, 11: 7, 12: 7, 13: 6, 14: 5, 15: 2, 16: 1, 17: 1, 18: 1, 19: 1}}

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    您可以尝试在两个表之间进行 vlookup,如下所示:

    try:
        for origin in df2["origin"].values:
            for destination in df2["destination"].values:
                for carrier_name in df2["carrier_name"].values:
                    mask1 = (
                        (df1["origin"] == origin)
                        & (df1["destination"] == destination)
                        & (df1["carrier_name"] == carrier_name)
                    )
                    medium_need = df1.loc[mask1, "medium need"].item()
                    mask2 = (
                        (df2["origin"] == origin)
                        & (df2["destination"] == destination)
                        & (df2["carrier_name"] == carrier_name)
                    )
                    df2.loc[mask2, "medium need"] = medium_need
    except ValueError:
        continue
    

    【讨论】:

    • 它给了我python ValueError: can only convert an array of size 1 to a Python scalar
    • 第二个和第三个循环缺少一个“.values”,抱歉。我已经相应地更新了我的答案。
    • 对不起,同样的结果(((
    • 还有一个错字(名称“国家”被错误地使用,而不是“原产地”)。用我刚刚更新的代码再试一次。
    • 其实,我之前改过。但还是一样Traceback (most recent call last): File "C:/Users/proje/PycharmProjects/wheellable/123.py", line 26, in <module> medium_need = df_report.loc[mask1, "medium need"].item() File "C:\Users\proje\PycharmProjects\wheellable\venv\lib\site-packages\pandas\core\base.py", line 420, in item raise ValueError("can only convert an array of size 1 to a Python scalar") ValueError: can only convert an array of size 1 to a Python scalar
    【解决方案2】:
    total_df = pd.merge(table_2_df, table_1_df, how='left', on=['origin', 'destination', 'carrier_name'])
    

    编辑

    查看您的数据后,表 2 中的目标字段似乎有一些空白字符

    import pandas as pd
    
    table_1_data = {
        'index': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10, 11: 11, 12: 12, 13: 13, 14: 14,
                  15: 15, 16: 16, 17: 17, 18: 18, 19: 19},
        'origin': {0: 'NEW YORK', 1: 'NEW YORK', 2: 'NEW YORK', 3: 'NEW YORK', 4: 'NEW YORK', 5: 'NEW YORK', 6: 'NEW YORK',
                   7: 'NEW YORK', 8: 'NEW YORK', 9: 'NEW YORK', 10: 'NEW YORK', 11: 'NEW YORK', 12: 'NEW YORK',
                   13: 'NEW YORK', 14: 'NEW YORK', 15: 'NEW YORK', 16: 'NEW YORK', 17: 'NEW YORK', 18: 'NEW YORK',
                   19: 'NEW YORK'},
        'destination': {0: 'Aqaba', 1: 'Aqaba', 2: 'Batumi', 3: 'Benghazi', 4: 'Benghazi', 5: 'Bremerhaven', 6: 'El Khoms',
                        7: 'El Khoms', 8: 'El Khoms', 9: 'Jebel Ali', 10: 'Jebel Ali', 11: 'Jebel Ali', 12: 'Klaipeda',
                        13: 'Klaipeda', 14: 'MISURATA', 15: 'MISURATA', 16: 'MISURATA', 17: 'Novorossiysk', 18: 'Odessa',
                        19: 'Odessa'},
        'carrier_name': {0: 'HAPAG LLOYD', 1: 'MEDITERRANEAN SHIPPING CORP', 2: 'MEDITERRANEAN SHIPPING CORP', 3: 'CGM',
                         4: 'MAERSK LINES, INC.', 5: 'CGM', 6: 'CGM', 7: 'HAPAG LLOYD', 8: 'MAERSK LINES, INC.',
                         9: 'HAPAG LLOYD', 10: 'MAERSK LINES, INC.', 11: 'ONE NETWORK EXPRESS', 12: 'CGM',
                         13: 'EVERGREEN INTERNATIONAL (U S A)', 14: 'CGM', 15: 'HAPAG LLOYD', 16: 'MAERSK LINES, INC.',
                         17: 'MEDITERRANEAN SHIPPING CORP', 18: 'CGM', 19: 'Cosco Container Line'},
        'medium need': {0: 20.0, 1: 19.0, 2: 5.0, 3: 30.0, 4: 26.0, 5: 28.0, 6: 15.0, 7: 11.0, 8: 12.0, 9: 15.0, 10: 18.0,
                        11: 16.0, 12: 16.0, 13: 10.0, 14: 7.0, 15: 6.0, 16: 7.0, 17: 6.0, 18: 42.0, 19: 26.0}}
    
    
    table_1_df = pd.DataFrame(table_1_data)
    
    table_2_data = {
        'index': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10, 11: 11, 12: 12, 13: 13, 14: 14,
                  15: 15, 16: 16, 17: 17, 18: 18, 19: 19},
        'origin': {0: 'NEW YORK', 1: 'NEW YORK', 2: 'NEW YORK', 3: 'NEW YORK', 4: 'NEW YORK', 5: 'NEW YORK', 6: 'NEW YORK',
                   7: 'NEW YORK', 8: 'NEW YORK', 9: 'NEW YORK', 10: 'NEW YORK', 11: 'NEW YORK', 12: 'NEW YORK',
                   13: 'NEW YORK', 14: 'NEW YORK', 15: 'NEW YORK', 16: 'NEW YORK', 17: 'NEW YORK', 18: 'NEW YORK',
                   19: 'NEW YORK'},
        'destination': {0: 'Aqaba ', 1: 'Aqaba ', 2: 'Aqaba ', 3: 'Aqaba ', 4: 'Aqaba ', 5: 'Aqaba ', 6: 'Aqaba ',
                        7: 'Aqaba ', 8: 'Aqaba ', 9: 'Aqaba ', 10: 'Aqaba ', 11: 'Aqaba ', 12: 'Aqaba ', 13: 'Aqaba ',
                        14: 'Aqaba ', 15: 'Aqaba ', 16: 'Aqaba ', 17: 'Aqaba ', 18: 'Aqaba ', 19: 'Aqaba '},
        'from_': {0: '3/22/2021', 1: '3/29/2021', 2: '4/05/2021', 3: '3/29/2021', 4: '4/05/2021', 5: '4/12/2021',
                  6: '3/22/2021', 7: '3/29/2021', 8: '4/12/2021', 9: '4/05/2021', 10: '4/05/2021', 11: '4/12/2021',
                  12: '4/12/2021', 13: '4/12/2021', 14: '3/22/2021', 15: '3/22/2021', 16: '3/22/2021', 17: '3/22/2021',
                  18: '4/12/2021', 19: '3/29/2021'},
        'to_': {0: '3/29/2021', 1: '4/05/2021', 2: '4/12/2021', 3: '4/05/2021', 4: '4/12/2021', 5: '4/19/2021',
                6: '3/29/2021', 7: '4/05/2021', 8: '4/19/2021', 9: '4/12/2021', 10: '4/12/2021', 11: '4/19/2021',
                12: '4/19/2021', 13: '4/19/2021', 14: '3/29/2021', 15: '3/29/2021', 16: '3/29/2021', 17: '3/29/2021',
                18: '4/19/2021', 19: '4/05/2021'},
        'carrier_name': {0: 'MEDITERRANEAN SHIPPING CORP', 1: 'MEDITERRANEAN SHIPPING CORP', 2: 'HAPAG LLOYD',
                         3: 'MEDITERRANEAN SHIPPING CORP', 4: 'MEDITERRANEAN SHIPPING CORP',
                         5: 'MEDITERRANEAN SHIPPING CORP', 6: 'HAPAG LLOYD', 7: 'HAPAG LLOYD',
                         8: 'MEDITERRANEAN SHIPPING CORP', 9: 'MEDITERRANEAN SHIPPING CORP', 10: 'MAERSK LINES, INC.',
                         11: 'MAERSK LINES, INC.', 12: 'HAPAG LLOYD', 13: 'HAPAG LLOYD', 14: 'MAERSK LINES, INC.',
                         15: 'MAERSK LINES, INC.', 16: 'MEDITERRANEAN SHIPPING CORP', 17: 'HAPAG LLOYD', 18: 'CGM',
                         19: 'MEDITERRANEAN SHIPPING CORP'},
        'vessel_name': {0: 'MSC RANIA', 1: 'MSC RANIA', 2: 'CMA CGM IVANHOE', 3: 'SEAMAX BRIDGEPORT',
                        4: 'SEAMAX BRIDGEPORT', 5: 'NAVARINO', 6: 'EXPRESS ATHENS', 7: 'EXPRESS ATHENS', 8: 'NAVIOS UTMOST',
                        9: 'NAVIOS UTMOST', 10: 'MAERSK SEBAROK', 11: 'MAERSK COLUMBUS', 12: 'EXPRESS ROME',
                        13: 'EXPRESS ROME', 14: 'MAERSK ATLANTA', 15: 'MAERSK ATLANTA', 16: 'MSC RANIA',
                        17: 'OOCL WASHINGTON', 18: 'OOCL EUROPE', 19: 'MSC RANIA'},
        'doc_cut': {0: '3/29/2021', 1: '3/29/2021', 2: '4/9/2021', 3: '4/5/2021', 4: '4/5/2021', 5: '4/19/2021',
                    6: '3/29/2021', 7: '3/29/2021', 8: '4/12/2021', 9: '4/12/2021', 10: '4/6/2021', 11: '4/13/2021',
                    12: '4/16/2021', 13: '4/15/2021', 14: '3/26/2021', 15: '3/26/2021', 16: '3/29/2021', 17: '3/23/2021',
                    18: '4/15/2021', 19: '3/29/2021'},
        'container_type': {0: 'HV', 1: 'HV', 2: 'HV', 3: 'HV', 4: 'HV', 5: 'HV', 6: 'HV', 7: 'HV', 8: 'HV', 9: 'HV',
                           10: '45', 11: '45', 12: 'HV', 13: 'HV', 14: 'HV', 15: '45', 16: '45', 17: 'HV', 18: '2B',
                           19: '45'},
        'count': {0: 32, 1: 32, 2: 26, 3: 15, 4: 15, 5: 14, 6: 13, 7: 13, 8: 12, 9: 12, 10: 8, 11: 7, 12: 7, 13: 6, 14: 5,
                  15: 2, 16: 1, 17: 1, 18: 1, 19: 1}}
    
    table_2_df = pd.DataFrame(table_2_data)
    table_2_df['destination'] = table_2_df['destination'].str.strip()
    
    
    total_df = pd.merge(table_2_df, table_1_df, how='left', on=['origin', 'destination', 'carrier_name'])
    
    print(total_df)
        index_x    origin destination  ... count index_y medium need
    0         0  NEW YORK       Aqaba  ...    32     1.0        19.0
    1         1  NEW YORK       Aqaba  ...    32     1.0        19.0
    2         2  NEW YORK       Aqaba  ...    26     0.0        20.0
    3         3  NEW YORK       Aqaba  ...    15     1.0        19.0
    4         4  NEW YORK       Aqaba  ...    15     1.0        19.0
    5         5  NEW YORK       Aqaba  ...    14     1.0        19.0
    6         6  NEW YORK       Aqaba  ...    13     0.0        20.0
    7         7  NEW YORK       Aqaba  ...    13     0.0        20.0
    8         8  NEW YORK       Aqaba  ...    12     1.0        19.0
    9         9  NEW YORK       Aqaba  ...    12     1.0        19.0
    10       10  NEW YORK       Aqaba  ...     8     NaN         NaN
    11       11  NEW YORK       Aqaba  ...     7     NaN         NaN
    12       12  NEW YORK       Aqaba  ...     7     0.0        20.0
    13       13  NEW YORK       Aqaba  ...     6     0.0        20.0
    14       14  NEW YORK       Aqaba  ...     5     NaN         NaN
    15       15  NEW YORK       Aqaba  ...     2     NaN         NaN
    16       16  NEW YORK       Aqaba  ...     1     1.0        19.0
    17       17  NEW YORK       Aqaba  ...     1     0.0        20.0
    18       18  NEW YORK       Aqaba  ...     1     NaN         NaN
    19       19  NEW YORK       Aqaba  ...     1     1.0        19.0
    [20 rows x 12 columns]
    

    【讨论】:

    • 它实际上给了我没有值的列
    • 所有键列(来源、目的地、运营商名称)的格式是否相同?是否有空白字符填充某些条目?一个表中是否有一些大写字母,而另一个表中是否有小写字母(反之亦然)? origin-destination-carrier_name 是否实际上映射到两个表?
    • @Vova 您能否将您拥有的一些示例数据写入您的问题中,以便我们可以完全按照您所看到的那样构建数据框?
    • 是的,它们的格式相同。我用column.str().strip() 删除了空格,但现在它甚至没有创建列。 “映射到两个表”是什么意思?
    • 映射意味着表1中存在表2中的键。根据您在问题中发布的图片看起来有匹配项。但是,如果您将表格中的一些数据写入一些代码中,我们可以一起尝试解决方案,这将是有帮助的
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2018-05-01
    • 2015-11-25
    • 2021-11-14
    • 2017-07-10
    • 1970-01-01
    • 2019-08-09
    • 2018-11-01
    相关资源
    最近更新 更多