【问题标题】:BeautifulSoup to get image name from P class picture tag in PythonBeautifulSoup 从 Python 中的 P 类图片标签中获取图像名称
【发布时间】:2020-06-10 13:57:28
【问题描述】:
HTML 外观
<p class="rating item-rating">
<picture>
<source srcset="/assets/img/ratings/rating-4_5.svg" type="image/svg+xml"/>
<img src="/assets/img/ratings/rating-4_5.png"/>
</picture>
<span>
260
</span>
</p>
我想得到
/assets/img/ratings/rating-4_5.png
我应该如何改进以下代码?
img = soup.findAll('p',attrs={'class':'rating item-rating'})
for i in img:
print(i.picture)
【问题讨论】:
标签:
python
html
python-3.x
web-scraping
beautifulsoup
【解决方案1】:
您需要访问img 标记,因为它似乎在src 属性中保存了您想要的信息。
from bs4 import BeautifulSoup
s = '''<p class="rating item-rating">
<picture>
<source srcset="/assets/img/ratings/rating-4_5.svg" type="image/svg+xml"/>
<img src="/assets/img/ratings/rating-4_5.png"/>
</picture>
<span>
260
</span>
</p>'''
soup = BeautifulSoup(s, 'html.parser')
for p in soup.select('p.rating'):
print(p.picture.img['src'])
【解决方案2】:
您可以在img 标签中轻松获取src 值,例如:
import requests
from bs4 import BeautifulSoup
r = """<p class="rating item-rating">
<picture>
<source srcset="/assets/img/ratings/rating-4_5.svg" type="image/svg+xml"/>
<img src="/assets/img/ratings/rating-4_5.png"/>
</picture>
<span>
260
</span>
</p>"""
source = BeautifulSoup(r,'html')
img = source.findAll('p',attrs={'class':'rating item-rating'})
for parsing in img:
print(parsing.img['src'])