【发布时间】:2016-11-27 11:57:56
【问题描述】:
我正在抓取一个网站,然后尝试分成几段。通过查看抓取的文本,我可以非常清楚地看到某些段落分隔符没有正确拆分。请参阅下面的代码以重新创建问题!
from bs4 import BeautifulSoup
import requests
link = "http://www.presidency.ucsb.edu/ws/index.php?pid=111395"
response = requests.get(link)
soup = BeautifulSoup(response.content, 'html.parser')
paras = soup.findAll('p')
# Note that in printing the below, there are still a lot of "<p>" in that paragraph :(
print paras[614]
我尝试过使用其他解析器——类似的问题。
【问题讨论】:
标签: python python-2.7 parsing web-scraping beautifulsoup