如何防止聚合函数删除列？答案

【问题标题】：How to prevent aggregate function removing a column?如何防止聚合函数删除列？
【发布时间】：2020-02-11 10:18:51
【问题描述】：

我有一个包含 3 列的数据框。我按一列分组，并希望聚合另一列的这些组的最大值。但是我想保留我的第三栏。 This is the dataframe I start with。然后我按“邻居”分组并聚合“金额”的最大值。

agg_dict = {"Amount": np.max}
listings_group.groupby("neighbourhood").agg(agg_dict).reset_index()

但是this is the dataframe I end up with。它几乎可以满足我的要求，但我也想保留我的列“room_type”。

编辑

groupby 之前的数据框

neighbourhood   room_type   Amount
0   Allerton    Entire home/apt 7
1   Allerton    Private room    14
2   Allerton    Shared room 2
3   Arden Heights   Private room    4
4   Arrochar    Entire home/apt 12
5   Arrochar    Private room    3
6   Arverne Entire home/apt 29
7   Arverne Private room    43
8   Arverne Shared room 2

分组后的数据框

    neighbourhood   Amount
0   Allerton    14
1   Arden Heights   4
2   Arrochar    12
3   Arverne 43
4   Astoria 458
5   Bath Beach  7
6   Battery Park City   45
7   Bay Ridge   55

【问题讨论】：

请以可以复制到文本编辑器中的方式添加数据框（或问题的数据样本）。
由于您想要另一列，我假设您还想要所有行，而不是汇总的行数。一种方法是将新创建的数据框连接到现有数据框。
进行了编辑。这是正确的方法吗？
@Rohith 合并在这种情况下不是一个选项。我对“数量”并不感兴趣。这仅仅是在某个邻居中出现的 room_type x,y 和 z 的数量。但我想要每个社区最常使用的 room_type。
作为你的要求，我想你需要这个：listings_group.loc[listings_group.groupby("neighbourhood")['Amount'].idxmax()]

标签： python pandas aggregate pandas-groupby

【解决方案1】：

使用scipy.stats的另一种方式

listings_group.groupby('neighbourhood')['room_type'].agg(lambda x:scipy.stats.mode(x)[0]).reset_index()

【讨论】：

【解决方案2】：

尝试在 Amount 上使用 groupby idxmax 进行切片

listings_group.loc[listings_group.groupby("neighbourhood")['Amount'].idxmax()]

Out[347]:
   neighbourhood        room_type  Amount
1  Allerton       Private room     14
3  Arden Heights  Private room     4
4  Arrochar       Entire home/apt  12
7  Arverne        Private room     43

一步一步：

Amount 列上的 groupby idxmax 将返回每个组中 Amount 的值为最大值的行的索引。

m = listings_group.groupby("neighbourhood")['Amount'].idxmax()

Out[348]:
neighbourhood
Allerton         1
Arden Heights    3
Arrochar         4
Arverne          7
Name: Amount, dtype: int64

.loc 和 m 将切片并仅返回索引值等于 m 中的值的行

listings_group.loc[m]

Out[352]:
   neighbourhood        room_type  Amount
1  Allerton       Private room     14
3  Arden Heights  Private room     4
4  Arrochar       Entire home/apt  12
7  Arverne        Private room     43

【讨论】：