【问题标题】:BeautifulSoup to get image name from P class picture tag in PythonBeautifulSoup 从 Python 中的 P 类图片标签中获取图像名称
【发布时间】:2020-06-10 13:57:28
【问题描述】:

HTML 外观

<p class="rating item-rating">
<picture>
<source srcset="/assets/img/ratings/rating-4_5.svg" type="image/svg+xml"/>
<img src="/assets/img/ratings/rating-4_5.png"/>
</picture>
<span>
260
</span>
</p>

我想得到

/assets/img/ratings/rating-4_5.png

我应该如何改进以下代码?

img = soup.findAll('p',attrs={'class':'rating item-rating'})

for i in img:
    print(i.picture)

【问题讨论】:

    标签: python html python-3.x web-scraping beautifulsoup


    【解决方案1】:

    您需要访问img 标记,因为它似乎在src 属性中保存了您想要的信息。

    from bs4 import BeautifulSoup
    
    s = '''<p class="rating item-rating">
    <picture>
    <source srcset="/assets/img/ratings/rating-4_5.svg" type="image/svg+xml"/>
    <img src="/assets/img/ratings/rating-4_5.png"/>
    </picture>
    <span>
    260
    </span>
    </p>'''
    
    soup = BeautifulSoup(s, 'html.parser')
    for p in soup.select('p.rating'):
        print(p.picture.img['src'])
    

    【讨论】:

      【解决方案2】:

      您可以在img 标签中轻松获取src 值,例如:

         import requests
      
      from bs4 import BeautifulSoup
      r = """<p class="rating item-rating">
      <picture>
      <source srcset="/assets/img/ratings/rating-4_5.svg" type="image/svg+xml"/>
      <img src="/assets/img/ratings/rating-4_5.png"/>
      </picture>
      <span>
      260
      </span>
      </p>"""
      source = BeautifulSoup(r,'html')
      
      img = source.findAll('p',attrs={'class':'rating item-rating'})
      
      for parsing in img:
          print(parsing.img['src'])
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2012-01-04
        • 2016-07-18
        • 2020-01-31
        • 1970-01-01
        • 1970-01-01
        • 2011-08-17
        相关资源
        最近更新 更多