根据 MultiIndex 的多个级别从 DataFrame 中选择答案

【问题标题】：Select from a DataFrame based on several levels of the MultiIndex根据 MultiIndex 的多个级别从 DataFrame 中选择
【发布时间】：2019-05-06 17:28:14
【问题描述】：

当N > 2时，如何扩展基于前N-1层从DataFrame中选择的逻辑？

例如，考虑一个 DataFrame：

midx = pd.MultiIndex.from_product([[0, 1], [10, 20, 30], ["a", "b"]])
df = pd.DataFrame(1, columns=midx, index=np.arange(3))
In[11]: df
Out[11]: 
   0                 1               
  10    20    30    10    20    30   
   a  b  a  b  a  b  a  b  a  b  a  b
0  1  1  1  1  1  1  1  1  1  1  1  1
1  1  1  1  1  1  1  1  1  1  1  1  1
2  1  1  1  1  1  1  1  1  1  1  1  1

在这里，很容易选择第一级中 0 或 1 的列：

df[[0, 1]]

但同样的逻辑并没有扩展到在第一级选择 0 或 1 以及在第二级选择 10 或 20 的列：

In[13]: df[[(0, 10), (0, 20), (1, 10), (1, 20)]]
ValueError: operands could not be broadcast together with shapes (4,2) (3,) (4,2)

以下作品：

df.loc[:, pd.IndexSlice[[0, 1], [10, 20], :]]

but is cumbersome, especially when the selector needs to be extracted from another DataFrame with a 2-level MultiIndex:

idx = df.columns.droplevel(2)
In[16]: idx
Out[16]: 
MultiIndex(levels=[[0, 1], [10, 20, 30]],
           labels=[[0, 0, 0, 0, 0, 0, 1, 1, 1, ... 1, 2, 2]])
In[17]: df[idx]
ValueError: operands could not be broadcast together with shapes (12,2) (3,) (12,2)

编辑： 理想情况下，我还希望能够以这种方式对列进行排序，而不仅仅是选择它们 — 同样，本着 df[[1, 0]] 能够根据第一个排序列的精神级别。

【问题讨论】：

标签： python-3.x pandas multi-index

【解决方案1】：

如果可能，您可以通过boolean indexing 与get_level_values 和isin 进行过滤：

m1 = df.columns.get_level_values(0).isin([0,1])
m2 = df.columns.get_level_values(1).isin([10,20])

print (m1)
[ True  True  True  True  True  True  True  True  True  True  True  True]
print (m2)
[ True  True  True  True False False  True  True  True  True False False]
print (m1 & m2)
[ True  True  True  True False False  True  True  True  True False False]

df1 = df.loc[:, m1 & m2]
print (df1)
   0           1         
  10    20    10    20   
   a  b  a  b  a  b  a  b
0  1  1  1  1  1  1  1  1
1  1  1  1  1  1  1  1  1
2  1  1  1  1  1  1  1  1

df.columns = df.columns.droplevel(2)
print (df)
   0                 1               
  10 10 20 20 30 30 10 10 20 20 30 30
0  1  1  1  1  1  1  1  1  1  1  1  1
1  1  1  1  1  1  1  1  1  1  1  1  1
2  1  1  1  1  1  1  1  1  1  1  1  1

df2 = df.loc[:, m1 & m2]
print (df2)
   0           1         
  10 10 20 20 10 10 20 20
0  1  1  1  1  1  1  1  1
1  1  1  1  1  1  1  1  1
2  1  1  1  1  1  1  1  1

【讨论】：

谢谢，完全可以接受。但是，有两个问题：1）这几乎没有我提出的 pd.IndexSlice 那么麻烦； 2）这不能对列进行排序，只能选择它们。
@IgorPozdeev - 是的，你需要的并不容易，每个级别都需要自动完成 : 的自定义函数，这是选择所有行所必需的。这并不容易:(