【发布时间】:2017-07-11 13:46:40
【问题描述】:
从 edition.cnn.com/?refresh=1 获取数据进行测试
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen
my_url = urlopen("http://edition.cnn.com/?refresh=1")
sauce = my_url.read()
soup = soup(sauce,"lxml")
my_div = soup.find("div",{"class":"pg-no-rail"})
my_sections = my_div.findAll("section")
for section in my_sections:
print(section)
my_url.close()
但它读起来像:
<section class="zn--idx-0 zn-empty"> </section>
<section class="zn--idx-1 zn-empty"> </section>
<section class="zn--idx-2 zn-empty"> </section>
<section class="zn--idx-3 zn-empty"> </section>
<section class="zn--idx-4 zn-empty"> </section>
<section class="zn--idx-5 zn-empty"> </section>
<section class="zn--idx-6 zn-empty"> </section>
<section class="zn--idx-7 zn-empty"> </section>
我想到达图像中突出显示的 h2 元素
【问题讨论】:
标签: python web-scraping beautifulsoup