与正则表达式匹配的模式返回 None 而它不应该返回答案

【问题标题】：Pattern matching with regex returns None while it should not与正则表达式匹配的模式返回 None 而它不应该返回
【发布时间】：2017-05-25 12:31:07
【问题描述】：

我正在学习正则表达式和 Beautiful Soup，我正在学习关于正则表达式的 Google 教程。我正在使用 Google 教程网站中提供的 HTML 文件（练习设置在教程的设置部分）

代码如下：

with open(filepath,"r") as f: soup = bs(f, 'lxml')
soup.title

出来

<title>Popular Baby Names</title>

代码：

h3 = soup.find_all("h3") # With find_all() I will capture the content of the <h3> Tags (In fact only one h3 Tag exists
                         # containing the Year)

h3[0].get_text()

出来

u'Popularity in 1990'

代码：

pattern = re.compile(r'.+(\d\d\d\d).+') 
string = h3[0].get_text()
pattern.match(string).group(0)

出来

AttributeError                            Traceback (most recent call last)
<ipython-input-61-2e4daef3292c> in <module>()
----> 1 pattern.match(string).group(0)

AttributeError: 'NoneType' object has no attribute 'group'

我无法解释为什么 match() 没有按应有的方式捕获年份。

您的建议将不胜感激。

【问题讨论】：

你的字符串以1990结尾，所以后面的.+什么都匹配不了。
正如其他 cmets 所说，您的正则表达式不起作用 - 您可以在这里测试：regex101.com/r/d2NjKz/1
Python: Extract numbers from a string的可能重复
谢谢。问题最终是 .+ 。当我删除它的工作。但是，我在 regex101 上对其进行了测试，但我不仅传递了文本，还传递了标签，因此它可以工作。

标签： python regex beautifulsoup match

【解决方案1】：

因为它预计一年后至少有一个字符。尝试 .* 而不是 .+

【讨论】：

为什么要匹配.*？
* 匹配前一个字符的零个或多个，因此无需更多字符即可获得匹配。
它没有，我假设他这样做了。+ 因为他可能在一年之后想要一些东西，但 + 至少需要一个字符。零个或多个是 *。
@palako 但它不用于匹配任何东西。