【问题标题】:Merging two dataframes with hierarchical columns将两个数据框与分层列合并
【发布时间】:2020-09-25 09:11:13
【问题描述】:

这是我第一次在 pandas 中使用多索引,我需要一些帮助来将两个数据框与分层列合并。 这是我的两个数据框:

col_index = pd.MultiIndex.from_product([['a', 'b', 'c'], ['w', 'x']])
df1 = pd.DataFrame(np.ones([4,6]),columns=col_index, index=range(4))

     a         b         c     
     w    x    w    x    w    x
0  1.0  1.0  1.0  1.0  1.0  1.0
1  1.0  1.0  1.0  1.0  1.0  1.0
2  1.0  1.0  1.0  1.0  1.0  1.0
3  1.0  1.0  1.0  1.0  1.0  1.0

df2 = pd.DataFrame(np.zeros([2,6]),columns=col_index, index=range(2))

     a         b         c     
     w    x    w    x    w    x
0  0.0  0.0  0.0  0.0  0.0  0.0
1  0.0  0.0  0.0  0.0  0.0  0.0

当我使用合并方法时,我得到以下结果:

pd.merge(df1,df2, how='left', suffixes=('', '_2'), left_index = True, right_index= True ))

     a         b         c       a_2       b_2       c_2     
     w    x    w    x    w    x    w    x    w    x    w    x
0  1.0  1.0  1.0  1.0  1.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0
1  1.0  1.0  1.0  1.0  1.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0
2  1.0  1.0  1.0  1.0  1.0  1.0  NaN  NaN  NaN  NaN  NaN  NaN
3  1.0  1.0  1.0  1.0  1.0  1.0  NaN  NaN  NaN  NaN  NaN  NaN

但我想将两个数据帧合并到较低级别后缀在 ['w', 'x'] 上生效,如下所示:

     a                   b                   c               
     w  w_2    x  x_2    w  w_2    x  x_2    w  w_2    x  x_2
0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
1  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
2  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN
3  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN

【问题讨论】:

  • 如果未指定 on 参数,则合并将使用喜欢的名称列进行合并。如果您只合并索引,请改用join

标签: python pandas multi-index


【解决方案1】:

您可以将joinmergeswaplevel()reorder_levels 一起使用。然后使用.sort_index() 并传递axis=1 按索引列排序。

  • 当您像这样对索引进行合并时,.join() 会更好。
  • .swaplevel() 在有两个级别时更好(如本例所示),而.reorder_levels() 在三个或更多级别时更好。

以下是这些方法的 4 种组合。对于这个具体的例子,我认为.join()/.swaplevel() 是最流行的(见最后一个例子):

df3 = (df1.reorder_levels([1,0],axis=1)
       .join(df2.reorder_levels([1,0],axis=1), rsuffix='_2')
       .reorder_levels([1,0],axis=1).sort_index(axis=1, level=[0, 1]))
df3
Out[1]: 
     a                   b                   c               
     w  w_2    x  x_2    w  w_2    x  x_2    w  w_2    x  x_2
0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
1  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
2  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN
3  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN

df3 = (pd.merge(df1.reorder_levels([1,0],axis=1),
                df2.reorder_levels([1,0],axis=1),
                how='left', left_index=True, right_index=True, suffixes = ('', '_2'))
                .reorder_levels([1,0],axis=1).sort_index(axis=1, level=[0, 1]))
df3
Out[2]: 
     a                   b                   c               
     w  w_2    x  x_2    w  w_2    x  x_2    w  w_2    x  x_2
0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
1  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
2  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN
3  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN

df3 = (pd.merge(df1.swaplevel(axis=1),
                df2.swaplevel(axis=1),
                how='left', left_index=True, right_index=True, suffixes = ('', '_2'))
                .swaplevel(axis=1).sort_index(axis=1, level=[0, 1]))
df3
Out[3]: 
     a                   b                   c               
     w  w_2    x  x_2    w  w_2    x  x_2    w  w_2    x  x_2
0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
1  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
2  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN
3  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN

df3 = (df1.swaplevel(i=0,j=1, axis=1)
       .join(df2.swaplevel(axis=1), rsuffix='_2')
       .swaplevel(axis=1).sort_index(axis=1, level=[0, 1]))
df3
Out[4]: 
     a                   b                   c               
     w  w_2    x  x_2    w  w_2    x  x_2    w  w_2    x  x_2
0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
1  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
2  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN
3  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN

【讨论】:

  • 如果您不知道:在这种情况下,df.swaplevel(axis=1) 可以替换您对df.reorder_levels(..., axis=1) 的每个调用。只是想传授另一种实现相同结果的方法。这也是该函数的documentation
  • 感谢@CameronRiddell,是的,我正在用两种方式更新我的答案!
猜你喜欢
  • 2020-09-27
  • 1970-01-01
  • 2013-04-06
  • 2018-06-25
  • 2017-05-25
  • 1970-01-01
  • 1970-01-01
  • 2020-05-06
  • 2019-09-19
相关资源
最近更新 更多