Pandas - 在重命名期间创建多索引列答案

【问题标题】：Pandas - Create Multiindex columns during renamePandas - 在重命名期间创建多索引列
【发布时间】：2019-01-07 08:04:31
【问题描述】：

我正在尝试找到一种简单的方法来将平面列索引重命名为分层多索引列集。我遇到了一种方法，但它似乎有点笨拙 - 在 Pandas 中是否有更好的方法来做到这一点？

#!/usr/bin/env python
import pandas as pd
import numpy as np

flat_df = pd.DataFrame(np.random.randint(0,100,size=(4, 4)), columns=list('ACBD'))

print flat_df

#      A   C   B   D
#  0  27  67  35  36
#  1  80  42  93  20
#  2  64   9  18  83
#  3  85  69  60  84


nested_columns = {'A': ('One', 'a'),
                  'C': ('One', 'c'),
                  'B': ('Two', 'b'),
                  'D': ('Two', 'd'),
                  }

tuples = sorted(nested_columns.values(), key=lambda x: x[1]) # Sort by second value
nested_df = flat_df.sort_index(axis=1) # Sort dataframe by column name
nested_df.columns = pd.MultiIndex.from_tuples(tuples)
nested_df = nested_df.sort_index(level=0, axis=1) # Sort to group first level

print nested_df

#    One     Two    
#      a   c   b   d
#  0  27  67  35  36
#  1  80  42  93  20
#  2  64   9  18  83
#  3  85  69  60  84

对分层列规范和数据框进行排序并假设它们会对齐似乎有点脆弱。同样排序三遍似乎很荒谬。我更喜欢的替代方案是nested_df = flat_df.rename(columns=nested_columns)，但似乎rename 无法从平列索引转到多索引列。我错过了什么吗？

编辑：意识到如果按第二个值排序的元组与平面列名的排序不同，这会中断。绝对错误的方法。

编辑2：回应@wen的回答：

nested_df = flat_df.rename(columns=nested_columns)
print nested_df
#    (One, a)  (One, c)  (Two, b)  (Two, d)
# 0        18         0        51        48
# 1        69        68        78        24
# 2         2        20        99        46
# 3         1        80        11        11

编辑3：

根据@ScottBoston 的回答，这是一个解决嵌套列中未提及的扁平列的可行解决方案：

#!/usr/bin/env python
import pandas as pd
import numpy as np

flat_df = pd.DataFrame(np.random.randint(0,100,size=(4, 5)), columns=list('ACBDE'))

print flat_df
#     A   C   B   D   E
# 0  27  68   4  98  16
# 1   0   9   9  72  68
# 2  91  17  19  54  99
# 3  14  96  54  79  28

nested_columns = {'A': ('One', 'e'),
                  'C': ('One', 'h'),
                  'B': ('Two', 'f'),
                  'D': ('Two', 'g'),
                  }

nested_df = flat_df.rename(columns=nested_columns)
nested_df.columns = [c if isinstance(c, tuple) else ('', c) for c in nested_df.columns]
nested_df.columns = pd.MultiIndex.from_tuples(nested_df.columns)

print nested_df
#   One     Two        
#     e   h   f   g   E
# 0  27  68   4  98  16
# 1   0   9   9  72  68
# 2  91  17  19  54  99
# 3  14  96  54  79  28

【问题讨论】：

很好地更新了解决方案。

标签： python pandas multi-index

【解决方案1】：

IIUC，rename

flat_df.rename(columns=nested_columns)
Out[224]: 
  One     Two    
    a   c   b   d
0  36  19  53  46
1  17  85  63  36
2  40  80  75  86
3  31  83  75  16

更新

flat_df.columns.map(nested_columns.get)
Out[15]: 
MultiIndex(levels=[['One', 'Two'], ['a', 'b', 'c', 'd']],
           labels=[[0, 0, 1, 1], [0, 2, 1, 3]])

【讨论】：

嗯...当我尝试这个时，我将元组作为标题而不是 pd.MultiIndex。
@ScottBoston umm 你有什么版本的熊猫
pd.__version__ "0.23.3"
@ScottBoston 我有 pd.__version__ Out[7]: '0.22.0'
@Wen 我尝试的第一件事。可悲的是，没有使用 Pandas 0.23.3

【解决方案2】：

你可以试试：

df.columns = pd.MultiIndex.from_tuples(df.rename(columns = nested_columns).columns)
df

输出：

  One     Two    
    a   c   b   d
0  27  67  35  36
1  80  42  93  20
2  64   9  18  83
3  85  69  60  84

【讨论】：

这非常接近完美答案！唯一的问题是并非所有列都是来自df.rename 的tuple 的一部分，但我总是可以在重命名后对列进行预处理以确保它们都是元组。
例如，如果在原始问题中保持其他所有内容相同，我使用flat_df = pd.DataFrame(np.random.randint(0,100,size=(4, 5)), columns=list('ACBDE'))，则此解决方案会中断。