在美丽的汤中选择带有一些指定文本的标签答案

【问题标题】：Select tags with some specified text in beautiful soup在美丽的汤中选择带有一些指定文本的标签
【发布时间】：2021-07-10 09:18:41
【问题描述】：

在一些 html 页面上，我有一堆看起来像这样的标签：

<a class="country" href="www.google.com" title="Germany">09:18, 9 July 2021</a>

在 BeautifulSoup 中，我只需要为德国选择那些年份为 2019 年的标签（例如，示例标签不适合此处，因为它有 2021 年）。'

最好的方法是什么？我只是从头开始学习BS，到目前为止我只能这样做：

germany = germany_soup.find_all(attrs={"title": "Germany"})

然后检查germany 中的每个标签是否其textattribute 包含2019。

我的问题：这是解决该问题的传统方式吗？有没有办法在 find_all 中指定 '2019' 以避免“手动”检查每个 tag.text 是否有'2019' 在循环中？

【问题讨论】：

标签： text beautifulsoup tags

【解决方案1】：

您可以使用re 模块在所有标签中查找特定文本以提取合适的输出

html="""<a class="country" href="www.google.com" title="Germany">09:18, 9 July 2021</a>
    <a class="country" href="www.google.com" title="Germany">09:18, 9 July 2019</a>
    <a class="country" href="www.google.com" title="Germany">07:11, 9 July 2019</a>
    <a class="country" href="www.google.com" title="Germany">09:18, 9 July 2010</a>
    """


import re
soup=BeautifulSoup(html,"html.parser")
soup.find_all("a",attrs={"title": "Germany"},text=re.compile("2019"))

输出：

[<a class="country" href="www.google.com" title="Germany">09:18, 9 July 2019</a>,
 <a class="country" href="www.google.com" title="Germany">07:11, 9 July 2019</a>]

【讨论】：