Python - 使用 .mean() 和 .agg() 对多列进行分组答案

【问题标题】：Python - Group-by multiple columns with .mean() and .agg()Python - 使用 .mean() 和 .agg() 对多列进行分组
【发布时间】：2018-12-31 22:07:49
【问题描述】：

我想按三列分组，然后为在前三列中重复的所有行找到第四个数字列的平均值。我可以通过以下功能实现这一点：

df2 = df.groupby(['col1', 'col2', 'col3'], as_index=False)['col4'].mean()

问题是我还想要第五列，它将聚合由 groupby 函数分组的所有行，我不知道如何在前一个函数之上执行此操作。例如：

df 
index    col1        col2       col3       col4       col5
0        Week_1      James      John       1          when and why?
1        Week_1      James      John       3          How?
2        Week_2      James      John       2          Do you know when?
3        Week_2      Mark       Jim        3          What time?
4        Week_2      Andrew     Simon      1          How far is it?
5        Week_2      Andrew     Simon      2          Are you going?


CURRENT(with above function):
index    col1        col2       col3       col4
0        Week_1      James      John       2
1        Week_2      James      John       2
2        Week_2      Mark       Jim        3
3        Week_2      Andrew     Simon      1.5

DESIRED:
index    col1        col2       col3       col4       col5
0        Week_1      James      John       2          when and why?, How?
2        Week_2      James      John       2          Do you know when?
3        Week_2      Mark       Jim        3          What time?
4        Week_2      Andrew     Simon      1.5        How far is it?, Are you going?

我尝试过here 和here，但是我使用的 .mean() 函数使过程复杂化。任何帮助，将不胜感激。（如果可能的话，我想在聚合时指定一个自定义分隔符来分隔 col5 的字符串）。

【问题讨论】：

标签： python pandas aggregate mean

【解决方案1】：

您可以为每个列定义聚合函数：

df2=df.groupby(['col1','col2','col3'], as_index=False).agg({'col4':'mean', 'col5':','.join})
print (df2)
     col1    col2   col3  col4                           col5
0  Week_1   James   John   2.0             when and why?,How?
1  Week_2  Andrew  Simon   1.5  How far is it?,Are you going?
2  Week_2   James   John   2.0              Do you know when?
3  Week_2    Mark    Jim   3.0                     What time?

一般解决方案是由mean 聚合数字列，其他由join 聚合：

f = lambda x: x.mean() if np.issubdtype(x.dtype, np.number) else ', '.join(x)
df2 = df.groupby(['col1', 'col2', 'col3'], as_index=False).agg(f)
print (df2)

     col1    col2   col3  col4                            col5
0  Week_1   James   John   2.0             when and why?, How?
1  Week_2  Andrew  Simon   1.5  How far is it?, Are you going?
2  Week_2   James   John   2.0               Do you know when?
3  Week_2    Mark    Jim   3.0                      What time?

【讨论】：

@LaurieBamber - 谢谢你，也谢谢你 :)