使用 BeautifulSoup 从 html 中提取文本答案

【问题标题】：Extract text from html using BeautifulSoup使用 BeautifulSoup 从 html 中提取文本
【发布时间】：2021-07-12 20:56:50
【问题描述】：

我是 python 和 BeautifulSoup 的新手，需要帮助编写一个 for 循环来从 html 中检索一些文本值。堆栈溢出也是新的:-)

我可以使用下面的 td 标签来抓取网页，并找到包含我想要添加到列表中的公司员工的行。不知道如何编写将忽略标签的 for 循环，只从每一行检索文本值（即员工姓名），然后将其添加到新列表，员工。因此，在下面的示例中，我如何将 John Doe、Bob Smith 等检索到列表中？任何帮助表示赞赏。

import requests
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
import re

url = 'my target URL'
target_url= uReq(url)
target_html = target_url.read()
soupy = soup(target_html, 'html.parser')
print(soupy.prettify())


employees = []
employees = soupy.findAll('td', headers= 'table5593r1')
employees

<td headers="'table5593r1"><a href="https://www.acme.org/about-acme/people/john-doe" target="_blank">Mr John Doe</a></td>,
 <td headers="'table5593r1"><a href="https://www.acme.org/about-acme/people/bob-smith">Dr Bob Smith</a></td>,
 <td headers="'table5593r1"><a href="https://www.acme.org/about-acme/people/jane-do">Dr Jane Do</a></td>,
 <td headers="'table5593r1"><a href="https://www.acme.org/about-acme/people/mary-jane">Ms Mary Jane</a></td>,

【问题讨论】：

标签： beautifulsoup

【解决方案1】：

This post 展示了如何获取 HTML 元素/标签的文本。要将员工姓名添加到新列表中，您可以执行以下操作：


employees = soupy.findAll('td', headers= 'table5593r1')
employeeNames = []

for employee in employees:
  employeeName = employee.text
  employeeNames.append(employeeName.strip())

我还建议进一步研究 this post 关于循环 HTML 元素列表的问题。

【讨论】：