【问题标题】:Python - Group-by multiple columns with .mean() and .agg()Python - 使用 .mean() 和 .agg() 对多列进行分组
【发布时间】:2018-12-31 22:07:49
【问题描述】:

我想按三列分组,然后为在前三列中重复的所有行找到第四个数字列的平均值。我可以通过以下功能实现这一点:

df2 = df.groupby(['col1', 'col2', 'col3'], as_index=False)['col4'].mean()

问题是我还想要第五列,它将聚合由 groupby 函数分组的所有行,我不知道如何在前一个函数之上执行此操作。例如:

df 
index    col1        col2       col3       col4       col5
0        Week_1      James      John       1          when and why?
1        Week_1      James      John       3          How?
2        Week_2      James      John       2          Do you know when?
3        Week_2      Mark       Jim        3          What time?
4        Week_2      Andrew     Simon      1          How far is it?
5        Week_2      Andrew     Simon      2          Are you going?


CURRENT(with above function):
index    col1        col2       col3       col4
0        Week_1      James      John       2
1        Week_2      James      John       2
2        Week_2      Mark       Jim        3
3        Week_2      Andrew     Simon      1.5

DESIRED:
index    col1        col2       col3       col4       col5
0        Week_1      James      John       2          when and why?, How?
2        Week_2      James      John       2          Do you know when?
3        Week_2      Mark       Jim        3          What time?
4        Week_2      Andrew     Simon      1.5        How far is it?, Are you going?

我尝试过herehere,但是我使用的 .mean() 函数使过程复杂化。任何帮助,将不胜感激。 (如果可能的话,我想在聚合时指定一个自定义分隔符来分隔 col5 的字符串)。

【问题讨论】:

    标签: python pandas aggregate mean


    【解决方案1】:

    您可以为每个列定义聚合函数:

    df2=df.groupby(['col1','col2','col3'], as_index=False).agg({'col4':'mean', 'col5':','.join})
    print (df2)
         col1    col2   col3  col4                           col5
    0  Week_1   James   John   2.0             when and why?,How?
    1  Week_2  Andrew  Simon   1.5  How far is it?,Are you going?
    2  Week_2   James   John   2.0              Do you know when?
    3  Week_2    Mark    Jim   3.0                     What time?
    

    一般解决方案是由mean 聚合数字列,其他由join 聚合:

    f = lambda x: x.mean() if np.issubdtype(x.dtype, np.number) else ', '.join(x)
    df2 = df.groupby(['col1', 'col2', 'col3'], as_index=False).agg(f)
    print (df2)
    
         col1    col2   col3  col4                            col5
    0  Week_1   James   John   2.0             when and why?, How?
    1  Week_2  Andrew  Simon   1.5  How far is it?, Are you going?
    2  Week_2   James   John   2.0               Do you know when?
    3  Week_2    Mark    Jim   3.0                      What time?
    

    【讨论】:

    • @LaurieBamber - 谢谢你,也谢谢你 :)
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-02-23
    • 2015-05-17
    • 1970-01-01
    • 1970-01-01
    • 2021-01-31
    • 1970-01-01
    相关资源
    最近更新 更多