【发布时间】:2021-06-26 15:23:22
【问题描述】:
我承认我真的不擅长使用 Bs4 抓取网页。 所以这里是我面临的问题。我得到了这个 html 文件。 我想要的只是从所有 span 包含后缀 -confirmed-vn
的 number....
<div class="board-content--left">
<div class="board-detail">
<div class="board-col col1 text-blue">
Cases<br>
<div style="margin-right:10px;">
<span class="live-confirmed-vn">15115 </span>
<span class="plus-confirmed-vn">+578</span>
</div>
</div>
<div class="board-col col2">
<div class="board-col-child">
Recovered:
<span class="live-recovered-vn"> 5949</span>
<span class="plus-recovered-vn">+0</span>
</div>
<div class="board-col-child">
Deaths:
<span class="live-death-vn"> 74 </span>
<span class="plus-death-vn"></span>
</div>
</div>
</div>
</div>
这就是我现在正在做的事情
import re
import request
from bs4 import BeautifulSoup
url = "https://thanhnien.vn/e-magazine/toan-canh-covid-19-tin-tuc-so-lieu-phan-tich-1265104.html"
# url contains html that contain structure above
req = requests.get(url)
soup = BeautifulSoup(req.text,features="html.parser")
test = soup.find_all('span', class_=re.compile(r'.+-confirmed-vn'))
print(test)
#print(test)
[<span class="live-confirmed-vn"></span>, <span class="plus-confirmed-vn"></span>, <span class="live-confirmed-vn text-red"></span>, <span class="live-confirmed-vn"></span>, <span class="live-confirmed-vn text-red"></span>, <span class="live-confirmed-vn"></span>]
【问题讨论】:
-
既不是 Python 专家也不是 BeautifulSoup 专家,但这个问题听起来像 this one。也许它会让你继续前进?
-
@thordarson 感谢您的帮助。不幸的是,它只显示跨度类,而不是我想要的数字:(
标签: html python-3.x beautifulsoup