【发布时间】:2020-10-17 12:39:46
【问题描述】:
我正在尝试从 SEC 的 EDGAR 数据库 https://www.sec.gov/Archives/edgar/data/101830/000010183019000022/sprintcorp201810-k.htm 中提取此 10K 报告的风险因素部分的文本
如您所见,我已经设法确定了风险因素(我想从中获取所有文本的部分)和未解决的员工评论(风险因素之后的部分)部分的标题,但我无法继续识别/抓取从这些到标题之间的所有文本(风险因素部分中的文本)。
正如您在此处看到的,我已经尝试了“next_sibling”方法和其他一些关于 SO 的建议,但我仍然做错了。
代码:
import requests
import bs4 as bs
file = requests.get('https://www.sec.gov/Archives/edgar/data/101830/000010183019000022/sprintcorp201810-k.htm')
soup = bs.BeautifulSoup(file.content, 'html.parser')
risk_factors_header = soup.find_all("a", text="Risk Factors")[0]
staff_comments_header = soup.find_all("a", text="Unresolved Staff Comments")[0]
risk_factors_text = risk_factors_header.next_sibling
print(risk_factors_text.contents)
所需输出的摘录(查找风险因素部分中的所有文本):
In addition to the other information contained in this Annual Report on Form 10-K, the following risk factors should be considered carefully in evaluating us. Our business, financial condition, liquidity or results of operations could be materially adversely affected by any of these risks.
Risks Relating to the Merger Transactions
The closing of the Merger Transactions is subject to many conditions, including the receipt of approvals from various governmental entities, which may not approve the Merger Transactions, may delay the approvals for, or may impose conditions or restrictions on, jeopardize or delay completion of, or reduce the anticipated benefits of, the Merger Transactions, and if these conditions are not satisfied or waived, the Merger Transactions will not be completed.
The completion of the Merger Transactions is subject to a number of conditions, including, among others, obtaining certain governmental authorizations, consents, orders or other approvals and the absence of any injunction prohibiting the Merger Transactions or any legal requ........
【问题讨论】:
标签: python web-scraping beautifulsoup