【问题标题】:Merge two dataframes with multi-index合并两个具有多索引的数据框
【发布时间】:2018-03-15 02:02:38
【问题描述】:

我已经看过几篇关于此的帖子,但我无法理解合并、加入和连接将如何处理此问题。如何合并两个数据框以找到匹配的索引?

在:

import pandas as pd
import numpy as np
row_x1 = ['a1','b1','c1']
row_x2 = ['a2','b2','c2']
row_x3 = ['a3','b3','c3']
row_x4 = ['a4','b4','c4']
index_arrays = [np.array(['first', 'first', 'second', 'second']), np.array(['one','two','one','two'])]
df1 = pd.DataFrame([row_x1,row_x2,row_x3,row_x4], columns=list('ABC'), index=index_arrays)
print(df1)

出来:

             A   B   C
first  one  a1  b1  c1
       two  a2  b2  c2
second one  a3  b3  c3
       two  a4  b4  c4

在:

row_y1 = ['d1','e1','f1']
row_y2 = ['d2','e2','f2']
df2 = pd.DataFrame([row_y1,row_y2], columns=list('DEF'), index=['first','second'])
print(df2)

出来

         D   E   F
first   d1  e1  f1
second  d2  e2  f2

也就是说,如何将它们合并以实现df3(如下)?

row_x1 = ['a1','b1','c1']
row_x2 = ['a2','b2','c2']
row_x3 = ['a3','b3','c3']
row_x4 = ['a4','b4','c4']
row_y1 = ['d1','e1','f1']
row_y2 = ['d2','e2','f2']

row_z1 = row_x1 + row_y1
row_z2 = row_x2 + row_y1
row_z3 = row_x3 + row_y2
row_z4 = row_x4 + row_y2

df3 = pd.DataFrame([row_z1,row_z2,row_z3,row_z4], columns=list('ABCDEF'), index=index_arrays)
print(df3)

出来

             A   B   C   D   E   F
first  one  a1  b1  c1  d1  e1  f1
       two  a2  b2  c2  d1  e1  f1
second one  a3  b3  c3  d2  e2  f2
       two  a4  b4  c4  d2  e2  f2

【问题讨论】:

    标签: python pandas merge concat multi-index


    【解决方案1】:

    选项 1
    使用pd.DataFrame.reindex + pd.DataFrame.join
    reindex 有一个方便的level 参数,允许您扩展不存在的索引级别。

    df1.join(df2.reindex(df1.index, level=0))
    
                 A   B   C   D   E   F
    first  one  a1  b1  c1  d1  e1  f1
           two  a2  b2  c2  d1  e1  f1
    second one  a3  b3  c3  d2  e2  f2
           two  a4  b4  c4  d2  e2  f2
    

    选项 2
    你可以重命名你的坐标轴,join 可以工作

    df1.rename_axis(['a', 'b']).join(df2.rename_axis('a'))
    
                 A   B   C   D   E   F
    a      b                          
    first  one  a1  b1  c1  d1  e1  f1
           two  a2  b2  c2  d1  e1  f1
    second one  a3  b3  c3  d2  e2  f2
           two  a4  b4  c4  d2  e2  f2
    

    您可以通过另一个rename_axis 跟进以获得所需的结果

    df1.rename_axis(['a', 'b']).join(df2.rename_axis('a')).rename_axis([None, None])
    
                 A   B   C   D   E   F
    first  one  a1  b1  c1  d1  e1  f1
           two  a2  b2  c2  d1  e1  f1
    second one  a3  b3  c3  d2  e2  f2
           two  a4  b4  c4  d2  e2  f2
    

    【讨论】:

      猜你喜欢
      • 2021-12-04
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-03-20
      • 2021-09-25
      • 1970-01-01
      • 2020-10-05
      相关资源
      最近更新 更多