如何限制 BeautifulSoup 找到的元素数量？答案

【问题标题】：How to limit the number of elements found by BeautifulSoup?如何限制 BeautifulSoup 找到的元素数量？
【发布时间】：2020-09-28 14:30:09
【问题描述】：

虽然使用 BeautifulSoup 抓取网页，有没有办法限制 find 方法系列找到的元素数量。

例如，如果我只想要前 5 个标签，我可以使用 BeautifulSoup 做到这一点吗？

【问题讨论】：

标签： web-scraping beautifulsoup

【解决方案1】：

.find_all() 和 .select() 返回标准 python 列表，因此您可以使用例如 [:5] 仅获取前 5 个结果：

from bs4 import BeautifulSoup

txt = '''
<div>Tag 1</div>
<div>Tag 2</div>
<div>Tag 3</div>
<div>Tag 4</div>
<div>Tag 5</div>
<div>Tag 6</div>
<div>Tag 7</div>
'''

soup = BeautifulSoup(txt, 'html.parser')

for div in soup.find_all('div')[:5]:
    print(div.text)

打印：

Tag 1
Tag 2
Tag 3
Tag 4
Tag 5

编辑：您可以使用 CSS 选择器选择前 5 个元素：

from bs4 import BeautifulSoup

txt = '''
<div>Tag 1</div>
<div>Tag 2</div>
<div>Tag 3</div>
<div>Tag 4</div>
<div>Tag 5</div>
<div>Tag 6</div>
<div>Tag 7</div>
'''

soup = BeautifulSoup(txt, 'html.parser')

for div in soup.select('div:nth-of-type(-n+5)'):
    print(div.text)

打印：

Tag 1
Tag 2
Tag 3
Tag 4
Tag 5

【讨论】：

这只是从生成的列表中选择前 5 个对吗？我想减少运行时间，我已经在做类似的事情了
@Karmah24 是的，它返回.find_all()找到的所有元素中的前5个元素
我可以在前五个之后终止搜索吗？
谢谢你能告诉nth-of-type(-n+5)是做什么的吗？
@Karmah24 试试this site 这是关于:nth-child()，但类似的规则也适用于此。或在这里stackoverflow.com/questions/11922165/…