遍历 pandas df 返回所有不匹配正则表达式的值答案

【问题标题】：Iterating over pandas df returning all values not matchin a regex遍历 pandas df 返回所有不匹配正则表达式的值
【发布时间】：2025-12-14 23:15:02
【问题描述】：

我正在尝试遍历列以识别无效条目。这行得通

weirdos = df.loc[df[column] == '7282'][['col1', 'col2']]

但是用正则表达式尝试同样的方法

regex = "^[a-zA-Z]{2}[*]{1}[a-zA-Z0-9]{3}[*]{1}[a-zA-Z0-9*]{0,30}$"
weirdos = df.loc[re.search(regex, df[column]) is not None][['col1', 'col2']]

不断收到错误TypeError: expected string or bytes-like object。有什么提示吗？

【问题讨论】：

标签： python python-3.x regex pandas

【解决方案1】：

假设column（未包含在一对引号'中）是一个包含要检查的列名的字符串变量，使用：

weirdos = df.loc[~df[column].str.contains(regex)][['col1', 'col2']]

请注意，您必须使用str.contains() 而不是str.match()，以便使用re.search() 遵守您的原始代码。这是因为str.contains() 底层使用re.search() 而str.match() 使用re.match() 仅在文本开头搜索匹配项。

~ 被添加到过滤条件中，因为您的问题标题提到 不匹配正则表达式如果您打算改为匹配，则可以将其删除。

一个提醒是在原始字符串下定义正则表达式，即regex = r'....'，这样您就不需要转义每个正则表达式符号。

试运行

data = {'col_0': ['baa', 'bbc', 'ccd'], 'col1': [10, 20, 30], 'col2': [100, 200, 300]}
df = pd.DataFrame(data)
print(df)
Output:
    col_0   col1    col2
0   baa       10     100
1   bbc       20     200
2   ccd       30     300

regex = r'aa'           # containing 'aa' anywhere in string
column = 'col_0'

weirdos = df.loc[~df[column].str.contains(regex)][['col1', 'col2']]     # filtering those NOT containing 'aa' anywhere in string
print(weirdos)

Output:
    col1    col2
1   20       200
2   30       300

【讨论】：

很好，非常感谢详细的回答。

【解决方案2】：

使用这个语法

df=pd.DataFrame({'A':['foo','buz','foz'],'B':range(3)})
df.loc[df['A'].str.match(r'^f')]['B']

【讨论】：