【问题标题】:DataFrame merging with ordered indices and different columnsDataFrame 与有序索引和不同列合并
【发布时间】:2018-03-25 03:07:10
【问题描述】:

我有两个想要合并的 pandas 数据框。数据框有不同的列和重叠的索引。我想合并它们,保持索引的顺序不变。

数据帧 (d1)

                              Dec 16 Dec 15   
Balance Sheet                     
NON-CURRENT LIABILITIES          NaN    NaN   <-- 'all Nan' row
Other Long Term Liabilities     8.37   9.30
Long Term Provisions           13.53  12.74   <-- Not present in d2
Total Non-Current Liabilities  21.90  22.04
CURRENT LIABILITIES              NaN    NaN   <-- 'all Nan' row
Trade Payables                 32.49  24.26

数据帧 (d2)

                               Dec 11 Dec 10
Balance Sheet                     
NON-CURRENT LIABILITIES           NaN    NaN
Deferred Tax Liabilities [Net]   0.00   7.40   <-- Not present in d1
Other Long Term Liabilities     14.13   0.00
Total Non-Current Liabilities   14.13   7.40
CURRENT LIABILITIES               NaN    NaN
Trade Payables                  77.35  60.40

我尝试了以下方法来合并这些数据框,但都没有奏效。

d1.merge(d2, how='left', left_index=True,right_index=True)

d1.merge(d2, how='outer', left_index=True,right_index=True)

pd.merge_ordered(d1,d2,left_on=['Dec 16'],right_on=['Dec 11'])

pd.concat([d1.merge(d2, how='left', left_index=True,right_index=True),d1.merge(d2, how='right', left_index=True,right_index=True)]).drop_duplicates(subset='Dec 16',keep='last')

我希望生成的数据框看起来像这样

                              Dec 16 Dec 15 Dec 11 Dec 10
Balance Sheet                    
NON-CURRENT LIABILITIES          NaN    NaN  NaN    NaN
Deferred Tax Liabilities [Net]   NaN    NaN  0.00   7.40    <-- from d2
Other Long Term Liabilities     8.37   9.30  14.13  0.00    <-- d1+d2 merged
Long Term Provisions           13.53  12.74  NaN    NaN     <-- from d1
Total Non-Current Liabilities  21.90  22.04  14.13  7.40    <-- d1+d2 merged
CURRENT LIABILITIES              NaN    NaN  NaN    NaN
Trade Payables                 32.49  24.26  77.35  60.40

请注意,整体顺序很重要(例如,所有 NaN 行需要以相同的顺序排列),但不是“所有 NaN”行之间合并索引的顺序。 d1 的列也应该在 d2 列之前。

【问题讨论】:

    标签: python-3.x pandas join merge


    【解决方案1】:

    how=outermerge 一起使用,reindex 与自定义订单一起使用

    In [1424]: order_index =  ['NON-CURRENT LIABILITIES',  'Deferred Tax Liabilities [Net]',  
                               'Other Long Term Liabilities',  'Long Term Provisions',  
                               'Total Non-Current Liabilities',  'CURRENT LIABILITIES',
                               'Trade Payables']
    
    In [1425]: df1.merge(df2,how='outer',left_index=True,right_index=True).reindex(order_index)
    Out[1425]:
                                    Dec 16  Dec 15  Dec 11  Dec 10
    Balance Sheet
    NON-CURRENT LIABILITIES            NaN     NaN     NaN     NaN
    Deferred Tax Liabilities [Net]     NaN     NaN    0.00     7.4
    Other Long Term Liabilities       8.37    9.30   14.13     0.0
    Long Term Provisions             13.53   12.74     NaN     NaN
    Total Non-Current Liabilities    21.90   22.04   14.13     7.4
    CURRENT LIABILITIES                NaN     NaN     NaN     NaN
    Trade Payables                   32.49   24.26   77.35    60.4
    

    另外,join 有效

    In [1426]: df1.join(df2, how='outer').reindex(order_index)
    Out[1426]:
                                    Dec 16  Dec 15  Dec 11  Dec 10
    Balance Sheet
    NON-CURRENT LIABILITIES            NaN     NaN     NaN     NaN
    Deferred Tax Liabilities [Net]     NaN     NaN    0.00     7.4
    Other Long Term Liabilities       8.37    9.30   14.13     0.0
    Long Term Provisions             13.53   12.74     NaN     NaN
    Total Non-Current Liabilities    21.90   22.04   14.13     7.4
    CURRENT LIABILITIES                NaN     NaN     NaN     NaN
    Trade Payables                   32.49   24.26   77.35    60.4
    

    详情

    In [1417]: df1
    Out[1417]:
                                   Dec 16  Dec 15
    Balance Sheet
    NON-CURRENT LIABILITIES           NaN     NaN
    Other Long Term Liabilities      8.37    9.30
    Long Term Provisions            13.53   12.74
    Total Non-Current Liabilities   21.90   22.04
    CURRENT LIABILITIES               NaN     NaN
    Trade Payables                  32.49   24.26
    
    In [1418]: df2
    Out[1418]:
                                    Dec 11  Dec 10
    Balance Sheet
    NON-CURRENT LIABILITIES            NaN     NaN
    Deferred Tax Liabilities [Net]    0.00     7.4
    Other Long Term Liabilities      14.13     0.0
    Total Non-Current Liabilities    14.13     7.4
    CURRENT LIABILITIES                NaN     NaN
    Trade Payables                   77.35    60.4
    

    【讨论】:

    • 感谢您的回复,但在这两种情况下,索引都已排序。要求是至少“所有 NaN”行应保持有序。
    • 用重新索引更新,有帮助吗?
    • 谢谢,是的,这绝对有帮助。有没有办法通过 d1.index 和 d2.index 的联合来动态准备 order_index,因为这些名称可能不同并且事先不知道。非常感谢所有的帮助。
    • 但是,排序逻辑是什么,我看不到它们是按字母顺序排列的?
    • 排序逻辑是大写字母索引(所有 Nan 行)将出现在两个数据帧中,并且需要在它们之间创建所有索引的联合。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2021-11-22
    • 1970-01-01
    • 2020-05-12
    • 2017-09-02
    • 2019-02-24
    • 2016-11-22
    • 1970-01-01
    相关资源
    最近更新 更多