【发布时间】:2017-10-14 22:02:49
【问题描述】:
我正在尝试从 div 类“caselawcontent searchable-content”中提取所有文本。此代码仅打印没有网页文本的 HTML。我缺少什么来获取文本?
以下链接位于“finteredcasesdoc.text”文件中:
http://caselaw.findlaw.com/mo-court-of-appeals/1021163.html
import requests
from bs4 import BeautifulSoup
with open('filteredcasesdoc.txt', 'r') as openfile1:
for line in openfile1:
rulingpage = requests.get(line).text
soup = BeautifulSoup(rulingpage, 'html.parser')
doctext = soup.find('div', class_='caselawcontent searchable-content')
print (doctext)
【问题讨论】:
标签: html python-3.x beautifulsoup python-requests