接受单词开头的美丽汤 find.all()答案

【问题标题】：Beautiful Soup find.all() that accepts start of word接受单词开头的美丽汤 find.all()
【发布时间】：2021-03-28 13:20:12
【问题描述】：

我正在网上抓取一个带有漂亮汤的网站，其类名如下：

<a class="Component-headline-0-2-109" data-key="card-headline" href="/article/politics-senate-elections-legislation-coronavirus-pandemic-bills-f100b3a3b4498a75d6ce522dc09056b0">

主要问题是类名总是以Component-headline- 开头，但只是使用随机数发送。当我使用美汤的soup.find_all('class','Component-headline') 时，由于唯一的编号，它无法抓取任何东西。是否可以使用find_all，但要抓取所有以“Component-headline”开头的类？

我也在考虑使用data-key="card-headline"，并使用soup.find_all('data-key','card-headline')，但由于某种原因也不起作用，所以我假设我无法通过数据键找到，但不确定。有什么建议吗？

【问题讨论】：

标签： python beautifulsoup

【解决方案1】：

BeautifulSoup 支持正则表达式，因此您可以使用re.compile 搜索类属性上的部分文本

import re 
soup.find_all('a', class_=re.compile('Component-headline'))

你也可以使用lambda

soup.find_all('a', class_=lambda c: c.startswith('Component-headline'))

【讨论】：

【解决方案2】：

尝试使用[attribute^=value] CSS 选择器。

要使用 CSS 选择器，而不是 find_all() 方法，请使用 select()。

以下选择所有以Component-headline开头的类：

soup = BeautifulSoup(html, "html.parser")

print(soup.select('[class^="Component-headline"]'))

【讨论】：