在 Pandas 数据框的多索引数据中按索引和值排序答案

【问题标题】：Sort by both index and value in Multi-indexed data of Pandas dataframe在 Pandas 数据框的多索引数据中按索引和值排序
【发布时间】：2020-06-21 07:30:22
【问题描述】：

假设，我有一个如下的数据框：

    year    month   message
0   2018    2   txt1
1   2017    4   txt2
2   2019    5   txt3
3   2017    5   txt5
4   2017    5   txt4
5   2020    4   txt3
6   2020    6   txt3
7   2020    6   txt3
8   2020    6   txt4

我想计算出每年的前三条消息。因此，我将数据分组如下：

df.groupby(['year','month']).count()

结果：

            message
year    month   
2017    4   1
        5   2
2018    2   1
2019    5   1
2020    4   1
        6   3

两个索引的数据都按升序排列。但是如何找到如下所示的结果，其中数据按年份（升序）和计数（降序）排序前 n 个值。 'month' 索引将是免费的。

            message
year    month   
2017    5   2
        4   1
2018    2   1
2019    5   1
2020    6   3
        4   1

【问题讨论】：

标签： python python-3.x pandas sorting multi-index

【解决方案1】：

value_counts默认给你排序：

df.groupby('year')['month'].value_counts()

输出：

year  month
2017  5        2
      4        1
2018  2        1
2019  5        1
2020  6        3
      4        1
Name: month, dtype: int64

如果您每年只想要 2 个最高值，请再做一次 groupby：

(df.groupby('year')['month'].value_counts()
   .groupby('year').head(2)
)

输出：

year  month
2017  5        2
      4        1
2018  2        1
2019  5        1
2020  6        3
      4        1
Name: month, dtype: int64

【讨论】：

非常感谢。这就是我要找的。span>
我们还可以将head 和value_counts 链接到apply: df.groupby('year')['month'].apply(lambda x: x.value_counts().head(2))

【解决方案2】：

这将按年份（升序）和计数（降序）排序。

df = df.groupby(['year', 'month']).count().sort_values(['year', 'message'], ascending=[True, False])

【讨论】：

谢谢，它似乎工作。实际上，我还有另一部分，如何限制我的结果为每年的前 2 个值？
您可以再次按“年份”对 df 进行分组并应用 head(n)，其中 n 是您希望每年返回的行数。df = df.groupby('year').head(2)

【解决方案3】：

您可以使用sort_index，指定ascending=[True,False]，这样只有第二级按降序排序：

df = df.groupby(['year','month']).count().sort_index(ascending=[True,False])

              message
year month         
2017 5            2
     4            1
2018 2            1
2019 5            1
2020 6            3
     4            1

【讨论】：

这不会按降序对“计数”进行排序。
@YoungWookBa 你是对的。不幸的是，它不起作用。

【解决方案4】：

给你

df.groupby(['year', 'month']).count().sort_values(axis=0, ascending=False, by='message').sort_values(axis=0, ascending=True, by='year')

【讨论】：

非常感谢，看来可以了。我如何限制我的结果，比如每年的前 2 个值？

【解决方案5】：

您可以使用此代码。

df.groupby(['year', 'month']).count().sort_index(axis=0, ascending=False).sort_values(by="year", ascending=True)

【讨论】：

试过了。它没有按降序对“计数”进行排序。