在 Python RE 搜索中对多个 TRIES 进行评分答案

【问题标题】：Scoring multiple TRUES in Pythton RE Search在 Python RE 搜索中对多个 TRIES 进行评分
【发布时间】：2020-04-30 23:57:36
【问题描述】：

背景

我在一个名为 bad_words.conf 的文件中有一个“坏词”列表，内容如下（为了这篇文章，我已经对其进行了更改，以使其干净，但在现实生活中它们是脏话）；

wrote (some )?rubbish
swore

我有一个用户输入字段，在将其作为data 传递给以下脚本 score.py 之前，该字段已被清除并删除了危险字符（为了这个例子，我刚刚输入了data 的值）

import re

data = 'I wrote some rubbish and swore too'

# Get list of bad words
bad_words = open("bad_words.conf", 'r')
lines = bad_words.read().split('\n')
combine = "(" + ")|(".join(lines) + ")"

#set score incase no results
score = 0

#search for bad words
if re.search(combine, data):
    #add one for a hit
    score += 1

#show me the score
print(str(score))
bad_words.close()

现在这会找到一个结果并添加 1 分，正如预期的那样，没有循环。

问题

我需要修改这个脚本，以便每次在文本中找到一行“bad_words.conf”时，我可以将分数加 1。所以在上面的例子中，data = 'I wrote some rubbish and swore too' 我想实际得分为 2。

1 表示“写了一些垃圾”，+1 表示“发誓”。

感谢您的帮助！

【问题讨论】：

标签： regex python-3.x loops python-re

【解决方案1】：

将combine 更改为：

combine = "|".join(lines)

并使用re.findall()：

In [33]: re.findall(combine,data)
Out[33]: ['rubbish', 'swore']

像您最初那样拥有多个捕获组的问题是，当其中一个单词匹配时，re.findall() 会将其中的每一个附加一个作为空字符串返回。

【讨论】：