Groupby最大值并返回熊猫数据框中的相应行答案

【问题标题】：Groupby max value and return corresponding row in pandas dataframeGroupby最大值并返回熊猫数据框中的相应行
【发布时间】：2017-12-12 01:56:43
【问题描述】：

我的数据框包含学生、日期和考试成绩。我想找到每个学生的最大日期并返回相应的行（最终，我最感兴趣的是学生最近的分数）。我怎么能在 pandas 中做到这一点？

假设我的数据框看起来像这样（缩写版本）：

Student_id  Date     Score
Tina1       1/17/17   .95
John2       1/18/17   .8
Lia1        12/13/16  .845
John2       1/25/17   .975
Tina1       1/1/17    .78
Lia1        6/12/16   .89

这就是我想要的：

Student_id  Date     Score
Tina1       1/17/17   .95
Lia1        12/13/16  .845
John2       1/25/17   .975

我在 SO 上找到了这个，但它给了我位置索引器越界错误。

df.iloc[df.groupby('student_id').apply(lambda x: x['date'].idxmax())]

还有哪些其他方法可以实现相同的目标？

【问题讨论】：

标签： python pandas dataframe group-by max

【解决方案1】：

您可以按日期对数据框进行排序，然后使用groupby.tail 获取最新记录：

df.iloc[pd.to_datetime(df.Date, format='%m/%d/%y').argsort()].groupby('Student_id').tail(1)

#Student_id     Date    Score
#2     Lia1 12/13/16    0.845
#0    Tina1  1/17/17    0.950
#3    John2  1/25/17    0.975

或者避免排序，使用idxmax（如果你没有重复索引，这有效）：

df.loc[pd.to_datetime(df.Date, format='%m/%d/%y').groupby(df.Student_id).idxmax()]

# Student_id       Date Score
#3     John2    1/25/17 0.975
#2      Lia1   12/13/16 0.845
#0     Tina1    1/17/17 0.950

【讨论】：