基于多个级别 1 列的子集多索引 df答案

【问题标题】：Subset multi-indexed DataFrame based on multiple level 1 columns基于多个级别 1 列的子集多索引 df
【发布时间】：2021-10-16 20:16:18
【问题描述】：

我有一个多索引数据框，但我想为每个 0 级变量（即“一”和“二”列）在每个 1 级保留两列。我可以单独对它们进行子集化，但我想一起做，这样我就可以并排保持这些值

这是数据框

index = pd.MultiIndex.from_tuples(list(zip(*[['bar1', 'foo1', 'bar1', 'foo2','bar3','foo3'], ['one','two','three','two','one','four']])))
df = pd.DataFrame(np.random.randn(2, 6), columns=index)

这是在级别 1 中对一列进行子集化的方法

df.iloc[:, df.columns.get_level_values(1)== 'one']
# or 
df.xs('one', level=1, axis=1)

# but adding two columns within either command will not work e.g. 
df.xs(('one','two), level=1, axis=1)

这将是预期的输出

         bar1        foo1       foo2         bar3
          one         two        two          one
0   -0.508272   -0.195379   0.865563     2.002205
1   -0.771565    1.360479   1.900931    -1.589277

欢迎任何建议，非常感谢！

【问题讨论】：

您可以将xs 选择视为垂直选择，因此('bar1', 'one')。在级别内选择时，所提供的解决方案非常适合

标签： python pandas multi-index

【解决方案1】：

这是使用pd.IndexSlice的一种方法：

idnx = pd.IndexSlice[:, ['one', 'two']]
df.loc[:, idnx]

输出：

       bar1      bar3      foo1      foo2
        one       one       two       two
0  0.589999  0.261224 -0.106588 -2.309628
1  0.646201 -0.491110  0.430724  1.027424

使用pd.DataFrame.loc 的一个鲜为人知的参数axis 的另一种方式：

df.loc(axis=1)[:, ['one', 'two']]

输出：

       bar1      bar3      foo1      foo2
        one       one       two       two
0  0.589999  0.261224 -0.106588 -2.309628
1  0.646201 -0.491110  0.430724  1.027424

注意：此参数未在 pd.DataFrame.loc 的文档化 API 中列出，但在用户指南的 MultiIndex / Advanced indexing 部分中被引用Using Slicers 中间段，举个例子。

【讨论】：

loc 的 axis=1 参数非常棒！

【解决方案2】：

查看旧时尚get_level_values

out = df.loc[:,df.columns.get_level_values(1).isin(['one','two'])]
Out[454]: 
       bar1      foo1      foo2      bar3
        one       two       two       one
0 -0.705540 -1.175132 -0.572076 -1.549703
1  0.277905  1.789925  1.104225  0.104453

【讨论】：

【解决方案3】：

您可以reindex 并指定level。

df.reindex(['one', 'two'], axis=1, level=1)

       bar1      foo1      foo2      bar3
        one       two       two       one
0  0.276056  1.956400 -1.495128  1.582220
1 -0.383178  1.159138 -1.646173  0.821942

【讨论】：

【解决方案4】：

我们可以在特定级别使用Index.isin创建布尔索引并使用loc进行选择：

df.loc[:, df.columns.isin(['one', 'two'], level=1)]

df:

       bar1      foo1      foo2      bar3
        one       two       two       one
0  0.042062 -0.233098  0.620974  0.330957
1  0.524495 -0.394930  0.572631  0.499279

【讨论】：