汇总熊猫数据框中的值答案

【问题标题】：Summarize values in panda data frames汇总熊猫数据框中的值
【发布时间】：2018-10-10 02:32:49
【问题描述】：

我想计算每年的最大值并显示部门和该值。例如，从屏幕截图中，我想显示： 2010：电信 781 2011：科技 973

我尝试过使用： df.groupby(['Year', 'Sector'])['Revenue'].max()

但这并没有给出具有最高价值的部门名称。

【问题讨论】：

【解决方案1】：

尝试使用idxmax 和loc：

df.loc[df.groupby(['Sector','Year'])['Revenue'].idxmax()]

MVCE：

import pandas as pd
import numpy as np

np.random.seed(123)
df = pd.DataFrame({'Sector':['Telecom','Tech','Financial Service','Construction','Heath Care']*3,
                   'Year':[2010,2011,2012,2013,2014]*3,
                   'Revenue':np.random.randint(101,999,15)})

df.loc[df.groupby(['Sector','Year'])['Revenue'].idxmax()]

输出：

               Sector  Year  Revenue
3        Construction  2013      423
12  Financial Service  2012      838
9          Heath Care  2014      224
1                Tech  2011      466
5             Telecom  2010      843

【讨论】：

【解决方案2】：

还有.sort_values + .tail，按年份分组。来自@Scott Boston 的数据

df.sort_values('Revenue').groupby('Year').tail(1)

输出：

               Sector  Year  Revenue
9          Heath Care  2014      224
3        Construction  2013      423
1                Tech  2011      466
12  Financial Service  2012      838
5             Telecom  2010      843

【讨论】：