在 pandas 中连接具有不同索引级别的数据帧答案

【问题标题】：concatenate dataframes with different levels of index in pandas在 pandas 中连接具有不同索引级别的数据帧
【发布时间】：2013-03-16 12:09:58
【问题描述】：

我无法理解 pandas 多索引的工作原理。具体来说：

如何合并两个不同索引级别的数据帧（按行）
如何更改数据帧的索引级别

使用来自previous question 的示例：

d1 = pd.DataFrame( {'StudentID':    ["x1", "x10", "x2","x3", "x4", "x5", "x6",   "x7", "x8", "x9"],
 'StudentGender' : ['F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'M'],
 'ExamenYear': ['2007','2007','2007','2008','2008','2008','2008','2009','2009','2009'],
 'Exam': ['algebra', 'stats', 'bio', 'algebra', 'algebra', 'stats', 'stats', 'algebra', 'bio', 'bio'],
 'Participated': ['no','yes','yes','yes','no','yes','yes','yes','yes','yes'],
 'Passed': ['no','yes','yes','yes','no','yes','yes','yes','no','yes']},
 columns = ['StudentID', 'StudentGender', 'ExamenYear', 'Exam', 'Participated', 'Passed'])

我计算两个数据集

def ZahlOccurence_0(x):
     return pd.Series({'All': len(x['StudentID']),
                   'Part': sum(x['Participated'] == 'yes'),
                   'Pass' :  sum(x['Passed'] == 'yes')})
t1 = d1.groupby(['ExamenYear', 'Exam']).apply(ZahlOccurence_0)   
t2 = d1.groupby('ExamenYear').apply(ZahlOccurence_0)

如何按行合并 t1 和 t2 ？

print t1
                    All  Part  Pass
ExamenYear Exam                    
2007       algebra    1     0     0
           bio        1     1     1
           stats      1     1     1
2008       algebra    2     1     1
           stats      2     2     2
2009       algebra    1     1     1
           bio        2     2     1

print t2 

            All  Part  Pass
ExamenYear                 
2007          3     2     2
2008          4     3     3
2009          3     3     2

我尝试了以下

t2 = t2.set_index([t2.index, np.array(['tot']* 3)], append = False)

但是

 pd.concat(t1,t2)

产生错误

ValueError：无法在 DataFrame 上调用 bool()。

我做错了什么？

提前致谢

【问题讨论】：

如果您输入help(pd.concat)，您会看到它将对象集合作为其第一个参数。 IOW，应该是pd.concat([t1, t2])。不过，我不确定您希望得到什么输出：是不是类似于pd.concat([t1, t2]).sort()？
也许这就是您要找的东西？ - t2['考试'] = 'tot' ; pd.concat([t1.reset_index(),t2.reset_index()])
傻我！谢谢！！！！！！ DSM：我只是忘了将对象放入列表中 :-( @user1827356：这就是我一直在寻找的东西，并且以一种复杂的方式做到了。您可以发表您的评论作为回复，以便我可以接受并关闭问题吗？

标签： pandas multi-index

【解决方案1】：

正如@DSM 指出的那样，DataFrame 对象需要在列表中

pd.concat([t1, t2])

我确实必须执行与您类似的计算。这是我的首选方法

t2['Exam'] = 'tot'
            All  Part  Pass Exam
ExamenYear
2007          3     2     2  tot
2008          4     3     3  tot
2009          3     3     2  tot

pd.concat([t1.reset_index(),t2.reset_index()], ignore_index=True)

   All     Exam ExamenYear  Part  Pass
0    1  algebra       2007     0     0
1    1      bio       2007     1     1
2    1    stats       2007     1     1
3    2  algebra       2008     1     1
4    2    stats       2008     2     2
5    1  algebra       2009     1     1
6    2      bio       2009     2     1
7    3      tot       2007     2     2
8    4      tot       2008     3     3
9    3      tot       2009     3     2

【讨论】：

谢谢。我是 pandas 的新手，但对它的功能很感兴趣。