在 Pandas 中按名称识别多个列答案

【问题标题】：Identifying multiple columns by name in Pandas在 Pandas 中按名称识别多个列
【发布时间】：2014-04-11 14:22:06
【问题描述】：

有没有办法使用文本匹配或正则表达式来选择列的子集？

在 R 中是这样的：

attach(iris) #Load the 'Stairway to Heaven' of R's built-in data sets
iris[grep(names(iris),pattern="Length")] #Prints only columns containing the word "Length"

【问题讨论】：

标签： python pandas

【解决方案1】：

您可以为此使用filter 方法（使用axis=1 过滤列名）。这个函数有不同的可能性：

相当于if 'Length' in col:
```
df.filter(like='Length', axis=1)
```
使用正则表达式（但是，它使用的是re.search 而不是re.match，因此您可能需要调整正则表达式）：
```
df.filter(regex=r'\.Length$', axis=1)
```

【讨论】：

非常好的信息@joris。但我还需要获取包含一些其他字符的列名以及列名。例如“Length_1”、“Length_2”、“Width_1”、“Width_2”等是我的列名。我的过滤器函数就像 df.filter(like=col+'_', axis=1) ，其中 col 将具有诸如“长度”、“宽度”等值...而不是获取值。知道我应该纠正什么吗？
你应该可以用正则表达式来做到这一点，例如regex=r"Length|Width"

【解决方案2】：

使用 Python 的 in 语句，它会像这样工作：

#Assuming iris is already loaded as a df called 'iris' and has a proper header
iris = iris[[col for col in iris.columns if 'Length' in col]]
print iris.head()

或者，使用正则表达式，

import re
iris = iris[[col for col in iris.columns if re.match(r'\.Length$',col)]]
print iris.head()

第一个会跑得更快，但第二个会更准确。

【讨论】：