使用正则表达式查找长度为 4 的单词答案

【问题标题】：find words of length 4 using regular expression使用正则表达式查找长度为 4 的单词
【发布时间】：2015-06-23 17:35:39
【问题描述】：

我正在尝试在正则表达式中查找长度为 4 的单词

我正在尝试这个，但我得到一个空列表：

#words that have length of 4
s = input("please enter an expression: ")
print(re.findall(r'/^[a-zA-Z]{4}$/',s))

我的代码有什么问题？

我的输入是：here we are having fun these days

我的预期输出：['here', 'days']

我的输出：[]

【问题讨论】：

通过添加 ^ 和 $，您要求整个字符串是长度为 4 的单词，而不是在其中找到长度为 4 的所有单词。
检查正则表达式 \b 运算符而不是 ^ 和 $
@AvinashRaj 请检查编辑

标签： python regex python-3.x

【解决方案1】：

使用字边界\b。当您在正则表达式中添加锚点时，例如^[a-zA-Z]{4}$，这将匹配只有四个字母的行。它不会检查每个单词。 ^ 断言我们处于起点，$ 断言我们处于终点。 \b 匹配单词字符和非单词字符（反之亦然）。所以它匹配单词的开头（零宽度）或单词的结尾（零宽度）。

>>> s = "here we are having fun these days"
>>> re.findall(r'\b[a-zA-Z]{4}\b', s)
['here', 'days']

【讨论】：

提及^ 匹配字符串的开头和$ 结尾可能很有用，以表明您需要删除这些以匹配任意放置在输入中的子字符串。
或者你也可以使用这个re.findall(r'(?<!\S)[a-zA-Z]{4}(?!\S)', s)。
我推荐阅读以下帖子Regex to match words of a certain length.
@Avinash Raj \b[a-zA-Z]{4}\b ..... 如果单词位于开头或结尾，则有效？
@JoseRicardoBustosM。是的。因为 start 和 end 不是单词字符。

【解决方案2】：

不需要（可能）复杂的regex，您可以使用列表推导：

>>> s = "here we are having fun these days"
>>> [word for word in s.split() if len(word) == 4 and word.isalpha()]
['here', 'days']
>>>

【讨论】：

必须是[word for word in s.split() if len(word) == 4 and word.isalpha()]
这很好，但这通常比正则表达式解决方案慢得多。就正则表达式而言，正则表达式其实并不复杂（而且正则表达式太有用了，不能学）。