我想按 bs4 顺序获取 p 标签内的 p 标签文本和其他标签文本答案

【问题标题】：I want to get p tag's text and other tag's text inside the p tag in order by bs4我想按 bs4 顺序获取 p 标签内的 p 标签文本和其他标签文本
【发布时间】：2021-09-07 13:22:51
【问题描述】：

我从汤中选择了一个内容，共有三种。

first
first
firstsecondthird

我想按“第一”、“第二”、“第三”的顺序从最后一个获取文本。首先，我使用“.text”，最后一个返回“firstsecondthird”。但我想一一获取文本。有什么办法吗？

我编辑了问题，以便您获得更多详细信息。

contents_list = soup.select('blabla')

# contents_list =
# ['<p>first</p>',
# '<p><span>first</span></p>',
# '<p>first<span>second</span>third</p>']

for content in contents_list:
  print(content.text)

# I want to get
# first
# first
# first, second, third

【问题讨论】：

请edit您的问题并向我们展示您已经尝试过的内容
@MendelG 我编辑我的问题。我是堆栈溢出的新手，所以我不熟悉它。对不起

标签： python html beautifulsoup web-crawler

【解决方案1】：

要使用空格分隔标签，您可以使用get_text() 方法并添加空格作为separator 参数。 .get_text(separator=" ").

from bs4 import BeautifulSoup


html = """
<p>first</p>
<p><span>first</span></p>
<p>first<span>second</span>third</p>

"""

soup = BeautifulSoup(html, "html.parser")

for tag in soup.find_all("p"):
    print(tag.get_text(separator=" "))

输出：

first
first
first second third

【讨论】：