【发布时间】:2018-12-31 22:07:49
【问题描述】:
我想按三列分组,然后为在前三列中重复的所有行找到第四个数字列的平均值。我可以通过以下功能实现这一点:
df2 = df.groupby(['col1', 'col2', 'col3'], as_index=False)['col4'].mean()
问题是我还想要第五列,它将聚合由 groupby 函数分组的所有行,我不知道如何在前一个函数之上执行此操作。例如:
df
index col1 col2 col3 col4 col5
0 Week_1 James John 1 when and why?
1 Week_1 James John 3 How?
2 Week_2 James John 2 Do you know when?
3 Week_2 Mark Jim 3 What time?
4 Week_2 Andrew Simon 1 How far is it?
5 Week_2 Andrew Simon 2 Are you going?
CURRENT(with above function):
index col1 col2 col3 col4
0 Week_1 James John 2
1 Week_2 James John 2
2 Week_2 Mark Jim 3
3 Week_2 Andrew Simon 1.5
DESIRED:
index col1 col2 col3 col4 col5
0 Week_1 James John 2 when and why?, How?
2 Week_2 James John 2 Do you know when?
3 Week_2 Mark Jim 3 What time?
4 Week_2 Andrew Simon 1.5 How far is it?, Are you going?
我尝试过here 和here,但是我使用的 .mean() 函数使过程复杂化。任何帮助,将不胜感激。 (如果可能的话,我想在聚合时指定一个自定义分隔符来分隔 col5 的字符串)。
【问题讨论】:
标签: python pandas aggregate mean