【问题标题】:Extract an image using Python's Beautiful Soup使用 Python 的 Beautiful Soup 提取图像
【发布时间】:2016-02-14 13:05:13
【问题描述】:

我使用以下代码从亚马逊列表中提取我需要的 HTML:

import requests
from bs4 import BeautifulSoup

r=requests.get("http://www.amazon.com/dp/B0007RXSB4")

soup=BeautifulSoup(r.content)

soup.find_all("div", {"id":"imgTagWrapperId"})

这给了我这个:

[<div class="imgTagWrapper" id="imgTagWrapperId">\n<img alt="Johnston         
&amp; Murphy Men's Greenwich Oxford,Black,6 D" class="a-dynamic-image 
a-stretch-vertical" data-a-dynamic-image='{"http://ecx.images-
amazon.com/images/I/81zwayZox-S._UY695_.jpg":
[695,695],"http://ecx.images-amazon.com/images/I/81zwayZox-
S._UY535_.jpg":[535,535],"http://ecx.images-
amazon.com/images/I/81zwayZox-S._UY500_.jpg":
[500,500],"http://ecx.images-amazon.com/images/I/81zwayZox-
S._UY575_.jpg":[575,575],"http://ecx.images-
amazon.com/images/I/81zwayZox-S._UY395_.jpg":
[395,395],"http://ecx.images-amazon.com/images/I/81zwayZox-
S._UY585_.jpg":[585,585]}' data-old-hires="http://ecx.images-
amazon.com/images/I/81zwayZox-S._UL1500_.jpg" id="landingImage" 
onload="this.onload='';setCSMReq('af');if(typeof addlongPoleTag === 
'function'){ addlongPoleTag('af','desktop-image-atf-
marker');};setCSMReq('cf')" src="http://ecx.images-
amazon.com/images/I/41KixMIlPNL._SY395_QL70_.jpg" style="max-
width:695px;max-height:695px;">\n</img></div>]

我只需要知道如何提取

http://ecx.images-amazon.com/images/I/81zwayZox-S._UY695_.jpg

来自上面的代码。

【问题讨论】:

    标签: python html python-3.x beautifulsoup


    【解决方案1】:

    首先,您需要在已找到的 div 中找到 img 标签。一种方法是链接find() 调用:

    img = soup.find("div", {"id": "imgTagWrapperId"}).find("img")
    

    或者,使用CSS selector

    img = soup.select_one("div#imgTagWrapperId > img")
    

    那么,如果你需要src属性中的图片URL:

    img["src"]
    

    如果您需要 data-a-dynamic-image 属性内的图像 URL,我建议您使用 json 模块将该值加载到 python 字典中并获取 keys()

    import json
    
    img = soup.find("div", {"id": "imgTagWrapperId"}).find("img")
    data = json.loads(img["data-a-dynamic-image"])
    print(list(data.keys()))
    

    打印:

    [
        u'http://ecx.images-amazon.com/images/I/81zwayZox-S._UY695_.jpg',
        u'http://ecx.images-amazon.com/images/I/81zwayZox-S._UY575_.jpg',     
        u'http://ecx.images-amazon.com/images/I/81zwayZox-S._UY500_.jpg',     
        u'http://ecx.images-amazon.com/images/I/81zwayZox-S._UY395_.jpg',     
        u'http://ecx.images-amazon.com/images/I/81zwayZox-S._UY535_.jpg',     
        u'http://ecx.images-amazon.com/images/I/81zwayZox-S._UY585_.jpg'
    ]
    

    【讨论】:

      猜你喜欢
      • 2018-08-05
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2011-11-03
      • 1970-01-01
      • 2018-05-30
      相关资源
      最近更新 更多