Python re.match 在字符串中间找不到字符答案

【问题标题】：Python re.match not finding characters in the middle of the stringPython re.match 在字符串中间找不到字符
【发布时间】：2019-04-10 14:11:41
【问题描述】：

我有一个网站链接列表，除了更改年份之外完全相同的网站链接，这是我想要找到的。我正在使用 re.match 来尝试找到它，因为除了 4 个字符（20xx）之外，字符串完全相同。出于某种原因，它只返回 None，我不知道为什么。

我尝试过使用findall和fullmatch等其他re方法，但是没有用。

state_links = ["https://2009-2017.state.gov/r/pa/prs/ps/2009/index.htm",
               "https://2009-2017.state.gov/r/pa/prs/ps/2010/index.htm",
               "https://2009-2017.state.gov/r/pa/prs/ps/2011/index.htm",
               "https://2009-2017.state.gov/r/pa/prs/ps/2012/index.htm",
               "https://2009-2017.state.gov/r/pa/prs/ps/2013/index.htm",
               "https://2009-2017.state.gov/r/pa/prs/ps/2014/index.htm",
               "https://2009-2017.state.gov/r/pa/prs/ps/2015/index.htm",
               "https://2009-2017.state.gov/r/pa/prs/ps/2016/index.htm"]

for link in state_links:
   year = re.match(r"https://2009-2017.state.gov/r/pa/prs/ps/(.*)/index.htm", link)

   print(year)

【问题讨论】：

对我来说它工作正常，请再次检查。
您应该转义正则表达式中的所有. 字符。但在这种情况下应该没什么区别。

标签： regex python-3.x

【解决方案1】：

您的示例如图所示有效，打印了一系列 re.Match 实例。（虽然，. 没有做你认为它在做的事情，在捕获组中使用\d{4} 可能是更合理的做法。一个普通的. 是任何字符的模式；你可能想要一个文字句点, \..)

无论如何，如果您的链接总是格式清晰，您也可以在这里使用 str 方法：

>>> [int(i.rsplit("/", 2)[-2]) for i in state_links]
[2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016]

这将每个链接分成 3 个部分，其中每个中间元素如下所示：

>>> state_links[0].rsplit("/", 2)
['https://2009-2017.state.gov/r/pa/prs/ps', '2009', 'index.htm']

[-2] 索引器然后获取年份组件。

【讨论】：

【解决方案2】：

正如@Drubio 所指出的，您的正则表达式模式是正确的。但是，请检查您的代码。以下作品：

regex = r"https://2009-2017.state.gov/r/pa/prs/ps/(\d{4})/index.htm"
years = re.finditer(regex, state_links, re.MULTILINE)
for year in years:    
    for j in range(0, len(year.groups())):
        j  += 1       
        print ("{year}".format(year = year.group(j))) 

Output
## 2009 2010 2011 2012 2013 2014 2015 2016

感谢@Brad 提供\d{4} 建议/更正以及.split 选项

【讨论】：