基于缺少列名的 Pandas DataFrame 切片错误答案

【问题标题】：Error in slicing pandas DataFrame based on Missing column names基于缺少列名的 Pandas DataFrame 切片错误
【发布时间】：2018-06-17 07:34:41
【问题描述】：

我有一个带有多个索引和列的 pandas 数据框我想根据一些列名对这个数据框进行切片，但有时给定的列名不在数据框中。 Pandas 提出使用.reindex 而不是.loc 的警告，但我发现了奇怪的结果。为了澄清，让我们加载数据帧

import pandas as pd
d2 = pd.read_csv('https://docs.google.com/uc?id=1Ufx6pvnSC6zQdTAj05ObmV027fA4-Mr3&export=download', index_col=[0,1])
d2.head(3)

结果是：

..............................................
:          :      : ind475 : ind476 : ind456 :
:..........:......:........:........:........:
: Country  : Year :        :        :        :
: Argentin : 1966 :   6.15 :   7.77 : NaN    :
:          : 1967 :   8.33 :   9.81 : NaN    :
:          : 1968 :   9.19 :   10.2 : NaN    :
:..........:......:........:........:........:

如果我们使用现有的列进行切片，那么没问题：

indicators_list = ['ind475', 'ind456']
idx = pd.IndexSlice
d3 = d2.loc[idx[:,:], idx[indicators_list]]
d3.dropna(axis=0, how='all').dropna(axis=1, how='all').shape

输出>> (10006,2)

但是如果我们用一个或多个缺失的列进行切片，则会引发错误，但它可以工作

indicators_list = ['ind475', 'ind179']
d4 = d2.loc[idx[:,:], idx[indicators_list]]
d4.dropna(axis=0, how='all').dropna(axis=1, how='all').shape

出局>> (2672, 1) 带有红色警告

FutureWarning: 
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  return self._getitem_nested_tuple(tup)

我尝试按照警告的建议和this post 中所示使用重新索引，但结果没有！

indicators_list = ['ind475', 'ind179']
d5 = d2.reindex(columns=[indicators_list])
d5.dropna(axis=0, how='all').dropna(axis=1, how='all').shape

输出：>> (0, 0)

如何在没有警告或错误的情况下切片并获得正确的大小？

【问题讨论】：

当我在 Python 3.5.2 中使用 .reindex 运行您的代码时，它似乎可以正常工作。 d5.dropna(...)的形状如预期的(2672, 1)...
@Snowbunting，我在 python 3.6 上运行它，所有最新的库都通过 anaconda 运行

标签： python pandas dataframe reindex

【解决方案1】：

我相信您需要使用 isin 过滤列名称（然后在必要时删除 NaNs 列）：

indicators_list = ['ind475', 'ind179']
print (df2.loc[:, df2.columns.isin(indicators_list)])

或者：

print (df2[df2.columns[df2.columns.isin(indicators_list)]])

如果使用多索引，请使用 get_level_values:

print (df2.loc[:, df2.columns.get_level_values(0).isin(indicators_list)])

【讨论】：

啊，我也是这样过来的
你的意思是如果多索引也在列，对吧？因为我的例子是多行索引。
是的，索引中的多索引需要df2[df2.columns.get_level_values(0).isin(indicators_list)])