【发布时间】:2021-12-30 08:59:20
【问题描述】:
我是 python web 抓取的新手。我正在尝试构建一个仅从网站获取粗体文本下方的普通文本的脚本 - https://www.state.gov/cuba-restricted-list/list-of-restricted-entities-and-subentities-associated-with-cuba-effective-january-8-2021/
即只喜欢文本 MINFAR — Ministerio de las Fuerzas Armadas Revolucionarias 和 MININT — Ministrys 下的Ministerio del Interior 类似地直到Additional Subentities of Habaguanexand 并将它们存储为列表。我尝试使用以下代码获取那些。但我无法单独获取那些正常的文本值。
这是我的代码:
import requests
import re
from bs4 import BeautifulSoup
URL = "https://www.state.gov/cuba-restricted-list/list-of-restricted-entities-and-subentities-associated-with-cuba-effective-january-8-2021/"
page = requests.get(URL)
soup = BeautifulSoup(page.text, "lxml")
content = soup.find_all(lambda tag: tag.name == 'div' and tag.get('class') == ['entry-content'])
print(content)
欢迎任何想法的朋友。请随时分享您的想法。提前谢谢你:)
【问题讨论】:
标签: python html web-scraping beautifulsoup python-requests