在多索引上应用组聚合答案

【问题标题】：Applying group aggregation over multiindex在多索引上应用组聚合
【发布时间】：2016-11-21 17:48:10
【问题描述】：

我在玩熊猫。给定

day pokemon date cp 14 Abra 2016-11-14 14:08:37.205617 377 2016-11-14 22:47:02.467526 374 Bellsprout 2016-11-14 09:02:41.420506 460 2016-11-14 09:31:29.026961 541 2016-11-14 09:42:49.151360 125

我想为插入组相关 cp 均值的每个 pokemon 值添加一个新列。您可能认为，这是一个 multiIndex 结构，其中 (day, pokemon, date) 是索引元组。

到目前为止，我试图解决这个问题，将这个数据帧与通过按 day 和 pokemon 分组并应用平均操作获得的数据帧合并。结果，我丢失了 date 字段，但我仍然无法将它合并到我上面发布的那个。

我的预期结果是这样的：

day  pokemon     date                          cp   mean
14   Abra        2016-11-14 14:08:37.205617    377  
                 2016-11-14 22:47:02.467526    374  375.5
     Bellsprout  2016-11-14 09:02:41.420506    460
                 2016-11-14 09:31:29.026961    541
                 2016-11-14 09:42:49.151360    125  375.3

你会如何解决这个问题？谢谢，脸书

【问题讨论】：

标签： python pandas

【解决方案1】：

我认为您首先需要transform，然后通过反转duplicated 将boolean indexing 与mask 添加NaN：

g = df.groupby(level=[0,1])
df['mean'] = g['cp'].transform('mean')
df['mean'] = df['mean'][g['mean'].apply(lambda x: ~x.duplicated(keep='last'))]
print (df)
                                            cp        mean
day pokemon    date                                       
14  Abra       2016-11-14 14:08:37.205617  377         NaN
               2016-11-14 22:47:02.467526  374  375.500000
    Bellsprout 2016-11-14 09:02:41.420506  460         NaN
               2016-11-14 09:31:29.026961  541         NaN
               2016-11-14 09:42:49.151360  125  375.333333

【讨论】：

这么简单！我可以为每个组获取一个值吗？
是的，第二部分很难。