re.findall 只返回最后一场比赛答案

【问题标题】：re.findall only returning the last matchre.findall 只返回最后一场比赛
【发布时间】：2020-02-08 23:54:41
【问题描述】：

我有以下 HTML：

<tr>
<td style="text-align: left;" colspan="1">10:10</td>
<td style="text-align: left;" colspan="1">This is a description.</td>
</tr>
<tr>
<td colspan="1">10:30</td>
<td colspan="1">This is another description.</td>
</tr>

我想返回多个匹配项，每个匹配项包含两组：组 1 是时间戳，组 2 是描述。

当我跑步时

re.findall(r'<td.*>(\d\d:\d\d)<\/td><td.*>(.*?)<\/td>', HTML)

我只得到最后一场比赛：

[('10:30', 'This is another description.')]

谁能告诉我我的正则表达式有什么问题？

【问题讨论】：

尝试使用 [^>]* 而不是 .*，你可能吃了太多的字符，尤其是。在第一个 .*
为什么不用 BeautifulSoup 解析呢？

标签： python html regex python-3.x

【解决方案1】：

您的第一个 .* 会匹配尽可能多的字符，因此您会得到一个匹配，即从第一个 <td 到最后一个 </td> 的所有内容。前两个使用[^>]* 而不是.* 将使其仅匹配一个标签内的内容。

【讨论】：