根据索引名称字符串操作删除列[重复]答案

【问题标题】：delete columns based on index name string operation [duplicate]根据索引名称字符串操作删除列[重复]
【发布时间】：2020-01-15 02:14:55
【问题描述】：

我有一个包含很多列的大型数据框，并希望根据列名上的字符串操作删除一些。

考虑以下示例：

df_tmp = pd.DataFrame(data=[(1,2,3, "foo"), ("bar", 4,5,6), (7,"baz", 8,9)],
                     columns=["test", "anothertest", "egg", "spam"])

现在，我想删除列名包含test 的所有列；我试图调整给出here（列内容上的字符串操作）和here（解决名称）的答案无济于事。

df_tmp = df_tmp[~df_tmp.index.str.contains("test")]
# AttributeError: Can only use .str accessor with string values!

df_tmp[~df_tmp.name.str.contains("test")]
# AttributeError: 'DataFrame' object has no attribute 'name'

有人能指出我正确的方向吗？提前致谢。 :)

【问题讨论】：

use : df_tmp.loc[:,~df_tmp.columns.str.contains("test")] 您的代码不起作用，因为您检查的是索引而不是列，返回 True 列的正确方法是通过 df.loc[]

标签： python-3.x string pandas

【解决方案1】：

最好是df.filter()...

>>> df_tmp
  test anothertest  egg spam
0    1           2    3  foo
1  bar           4    5    6
2    7         baz    8    9

结果：

>>> df_tmp.loc[:,~df_tmp.columns.str.contains("test")]
   egg spam
0    3  foo
1    5    6
2    8    9

>>> df_tmp.drop(df_tmp.filter(like='test').columns, axis=1)
   egg spam
0    3  foo
1    5    6
2    8    9

>>> df_tmp.drop(df_tmp.filter(regex='test').columns, axis=1)
   egg spam
0    3  foo
1    5    6
2    8    9

>>> df_tmp.filter(regex='^((?!test).)*$')
   egg spam
0    3  foo
1    5    6
2    8    9

正则表达式解释

'^((?!test).)*$'

^         #Start matching from the beginning of the string.    
(?!test)  #This position must not be followed by the string "test".
.         #Matches any character except line breaks (it will include those in single-line mode).
$         #Match all the way until the end of the string.

不错explanation about regex negative lookahead

【讨论】：