根据级别 0 索引对多索引 Pandas DataFrame 的级别 1 索引进行自定义排序答案

【问题标题】：Custom sorting of the level 1 index of a multiindex Pandas DataFrame according to the level 0 index根据级别 0 索引对多索引 Pandas DataFrame 的级别 1 索引进行自定义排序
【发布时间】：2018-07-08 11:58:33
【问题描述】：

我有一个多索引数据框，df:

arrays = [['bar', 'bar', 'baz', 'baz', 'baz', 'baz', 'foo', 'foo'],
          ['one', 'two', 'one', 'two', 'three', 'four', 'one', 'two']]

df = pd.DataFrame(np.ones([8, 4]), index=arrays)

看起来像：

             0    1    2    3
bar one    1.0  1.0  1.0  1.0
    two    1.0  1.0  1.0  1.0
baz one    1.0  1.0  1.0  1.0
    two    1.0  1.0  1.0  1.0
    three  1.0  1.0  1.0  1.0
    four   1.0  1.0  1.0  1.0
foo one    1.0  1.0  1.0  1.0
    two    1.0  1.0  1.0  1.0

我现在需要将 'baz' 子级别排序为新顺序，以创建类似于 df_end 的内容：

arrays_end = [['bar', 'bar', 'baz', 'baz', 'baz', 'baz', 'foo', 'foo'],
              ['one', 'two', 'two', 'four', 'three', 'one', 'one', 'two']]

df_end = pd.DataFrame(np.ones([8, 4]), index=arrays_end)

看起来像：

             0    1    2    3
bar one    1.0  1.0  1.0  1.0
    two    1.0  1.0  1.0  1.0
baz two    1.0  1.0  1.0  1.0
    four   1.0  1.0  1.0  1.0
    three  1.0  1.0  1.0  1.0
    one    1.0  1.0  1.0  1.0
foo one    1.0  1.0  1.0  1.0
    two    1.0  1.0  1.0  1.0

我认为我可以重新索引baz 行：

new_index = ['two','four','three','one']

df.loc['baz'].reindex(new_index)

这给出了：

         0    1    2    3
two    1.0  1.0  1.0  1.0
four   1.0  1.0  1.0  1.0
three  1.0  1.0  1.0  1.0
one    1.0  1.0  1.0  1.0

...并将这些值插入到原始 DataFrame 中：

df.loc['baz'] = df.loc['baz'].reindex(new_index)

但结果是：

             0    1    2    3
bar one    1.0  1.0  1.0  1.0
    two    1.0  1.0  1.0  1.0
baz one    NaN  NaN  NaN  NaN
    two    NaN  NaN  NaN  NaN
    three  NaN  NaN  NaN  NaN
    four   NaN  NaN  NaN  NaN
foo one    1.0  1.0  1.0  1.0
    two    1.0  1.0  1.0  1.0

这不是我想要的！所以我的问题是如何使用new_index 对baz 索引中的行进行重新排序。任何建议将不胜感激。

【问题讨论】：

这对我来说似乎是一个更好的解决方案：stackoverflow.com/questions/43073254/…

标签： python pandas sorting dataframe multi-index

【解决方案1】：

编辑：（以适应所需的布局）

arrays = [['bar', 'bar', 'baz', 'baz', 'baz', 'baz', 'foo', 'foo'],
          ['one', 'two', 'one', 'two', 'three', 'four', 'one', 'two']]

df = pd.DataFrame(np.arange(32).reshape([8, 4]), index=arrays)
new_baz_index = [('baz', i) for i in ['two','four','three','one']]
index = df.index.values.copy()
index[df.index.get_loc('baz')] = new_baz_index
df.reindex(index)

df.index.get_loc('baz') 将获取baz 部分的位置作为切片对象，我们只替换那里的部分。

【讨论】：

感谢您非常快速和乐于助人的回复 Tai。这太棒了:)
@tomp 没问题。很高兴它有帮助。

【解决方案2】：

更新:-)

pd.concat([df[df.index.get_level_values(level=0)!='baz'],df.reindex(list(zip(['baz']*4,['two','four','three','one'])))])
Out[1156]: 
             0    1    2    3
bar one    1.0  1.0  1.0  1.0
    two    1.0  1.0  1.0  1.0
foo one    1.0  1.0  1.0  1.0
    two    1.0  1.0  1.0  1.0
baz two    1.0  1.0  1.0  1.0
    four   1.0  1.0  1.0  1.0
    three  1.0  1.0  1.0  1.0
    one    1.0  1.0  1.0  1.0

【讨论】：

再次您好文，感谢您看这个！您答案中的 df 与我的问题中的输入 df 匹配，而不是输出。是否缺少步骤？