【问题标题】:Subset of levels sorting in pandas MultiIndex Series troublespandas MultiIndex Series 问题中的级别排序子集
【发布时间】:2020-02-15 05:16:32
【问题描述】:

我有一个具有 3 级 MultiIndex 的系列:

print(ser_test):
                            Value
Date       Group Country         
2014-01-31 3     AE       example
                 AR       example
2014-02-28 3     AE       example
                 AR       example
2014-03-31 3     AE       example
                 AR       example
2014-04-30 3     AE       example
                 AR       example
2014-05-30 3     AR       example
2014-06-30 2     AE       example
           3     AR       example
2014-07-31 2     AE       example
           3     AR       example
2014-08-29 2     AE       example
           3     AR       example
2014-09-30 2     AE       example
           3     AR       example
2014-10-31 2     AE       example
           3     AR       example
2014-11-28 2     AE       example
           3     AR       example
2014-12-31 2     AE       example
           3     AR       example

我的目标是先按 Country 然后按 Date 对 Series 进行排序,忽略 Group 级别以实现下一个结果:

                            Value
Date       Group Country         
2014-01-31 3     AE       example
2014-02-28 3     AE       example
2014-03-31 3     AE       example
2014-04-30 3     AE       example
2014-06-30 2     AE       example
2014-07-31 2     AE       example
2014-08-29 2     AE       example
2014-09-30 2     AE       example
2014-10-31 2     AE       example
2014-11-28 2     AE       example
2014-12-31 2     AE       example
2014-01-31 3     AR       example
2014-02-28 3     AR       example
2014-03-31 3     AR       example
2014-04-30 3     AR       example
2014-05-30 3     AR       example
2014-06-30 3     AR       example
2014-07-31 3     AR       example
2014-08-29 3     AR       example
2014-09-30 3     AR       example
2014-10-31 3     AR       example
2014-11-28 3     AR       example
2014-12-31 3     AR       example

而且我还需要级别,所以我不能简单地消除它。

所以我尝试使用这样的 sort_index 方法:

print(ser_test.sort_index(level = ['Country', 'Date']))

或者类似的:

print(ser_test.sort_index(level = ['Country', 'Date'], sort_remaining = False))

在这两种情况下,我都收到了一个结果,其中 Group 级别涉及排序过程并且在 Date 级别之前具有优先级:

                            Value
Date       Group Country         
2014-06-30 2     AE       example
2014-07-31 2     AE       example
2014-08-29 2     AE       example
2014-09-30 2     AE       example
2014-10-31 2     AE       example
2014-11-28 2     AE       example
2014-12-31 2     AE       example
2014-01-31 3     AE       example
2014-02-28 3     AE       example
2014-03-31 3     AE       example
2014-04-30 3     AE       example
2014-01-31 3     AR       example
2014-02-28 3     AR       example
2014-03-31 3     AR       example
2014-04-30 3     AR       example
2014-05-30 3     AR       example
2014-06-30 3     AR       example
2014-07-31 3     AR       example
2014-08-29 3     AR       example
2014-09-30 3     AR       example
2014-10-31 3     AR       example
2014-11-28 3     AR       example
2014-12-31 3     AR       example

我尝试使用 sort_index 的所有选项,并通过这段代码获得了意想不到的成功:

print(ser_test.sort_index(level = ['Country', 'Date'], ascending = [True, True]))

                            Value
Date       Group Country         
2014-01-31 3     AE       example
2014-02-28 3     AE       example
2014-03-31 3     AE       example
2014-04-30 3     AE       example
2014-06-30 2     AE       example
2014-07-31 2     AE       example
2014-08-29 2     AE       example
2014-09-30 2     AE       example
2014-10-31 2     AE       example
2014-11-28 2     AE       example
2014-12-31 2     AE       example
2014-01-31 3     AR       example
2014-02-28 3     AR       example
2014-03-31 3     AR       example
2014-04-30 3     AR       example
2014-05-30 3     AR       example
2014-06-30 3     AR       example
2014-07-31 3     AR       example
2014-08-29 3     AR       example
2014-09-30 3     AR       example
2014-10-31 3     AR       example
2014-11-28 3     AR       example
2014-12-31 3     AR       example

这很奇怪,我不确定这是获得有保证的预期排序结果的通用方法,而使用 MultiIndex 对我来说是一个关键选项。

那么,你能帮我理解 sort_index 的原理,并分享一段针对这个特殊案例的代码吗?

【问题讨论】:

    标签: python-3.x pandas sorting multi-index


    【解决方案1】:

    您可以尝试升级到 pandas 的最新版本,在 pandas 0.25.0 中测试并且运行良好:

    print(df.sort_index(level = ['Country', 'Date']))
                                Value
    Date       Group Country         
    2014-01-31 3     AE       example
    2014-02-28 3     AE       example
    2014-03-31 3     AE       example
    2014-04-30 3     AE       example
    2014-06-30 2     AE       example
    2014-07-31 2     AE       example
    2014-08-29 2     AE       example
    2014-09-30 2     AE       example
    2014-10-31 2     AE       example
    2014-11-28 2     AE       example
    2014-12-31 2     AE       example
    2014-01-31 3     AR       example
    2014-02-28 3     AR       example
    2014-03-31 3     AR       example
    2014-04-30 3     AR       example
    2014-05-30 3     AR       example
    2014-06-30 3     AR       example
    2014-07-31 3     AR       example
    2014-08-29 3     AR       example
    2014-09-30 3     AR       example
    2014-10-31 3     AR       example
    2014-11-28 3     AR       example
    2014-12-31 3     AR       example
    

    【讨论】:

    • 非常感谢,@jezrael!我已经将 pandas 从 0.24.2 更新到 0.25.1,并且 sort_index 现在正在按照我的预期工作!
    猜你喜欢
    • 2016-03-05
    • 2017-10-16
    • 2022-01-04
    • 2019-10-11
    • 2020-02-14
    • 2014-05-24
    • 2020-01-16
    • 1970-01-01
    • 2012-11-18
    相关资源
    最近更新 更多