【问题标题】:Scrape IMG SRC under DIV tag Using BeautifulSoup使用 BeautifulSoup 在 DIV 标签下刮取 IMG SRC
【发布时间】:2019-06-03 12:43:39
【问题描述】:

我正在尝试获取位于 Div 标记下的图像的 src。我的代码给了我一个错误,KeyError: 'src'

这是我的代码:

for page in range(1,4):
# code that gets dynamic URL
url = sys.argv[1] + "{}".format(page)
print(url)
html=urlopen(url)
soup=BeautifulSoup(html,"lxml")

for article in soup.find_all('article',class_='o-hit'):
    div=soup.find('div',{"class":"o-rating_thumb@m-"})
    img_src = div.find('img').attrs['src']
    #img_src = article.find('div',class_ ='o-rating_thumb c-white').img['src']   
    headline = article.h2.text.strip()

    summary = article.find('p',class_ ='mt-15@m+ t-d5@m- t-d5@tp+ c-gray-3').text.strip()

    #img_src = "none"

    print(headline)
    print(summary)
    print(img_src)
    csv_writer.writerow([headline,summary,img_src])

网页在这里: EndGadget Blog page 10

【问题讨论】:

    标签: python html beautifulsoup


    【解决方案1】:

    对于每个页面上最顶部的新闻,您可以从'src'属性本身获取图像源。

    您可以先使用find() 方法导航到包含图像的div。接下来在该 div 中,您可以找到 img 标记并从其 attributes 获取其来源。

    import requests
    from bs4 import BeautifulSoup
    url='https://www.engadget.com/reviews/latest/page/10/'
    res=requests.get(url)
    soup=BeautifulSoup(res.text,'html.parser')
    div=soup.find('div',{"class":"o-rating_thumb@m-"})
    print(div.find('img').attrs['src'])
    

    输出:

    https://o.aolcdn.com/images/dims?resize=810%2C455&crop=810%2C455%2C0%2C0&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fcrop%3D1400%252C933%252C0%252C0%26quality%3D85%26format%3Djpg%26resize%3D1600%252C1066%26image_uri%3Dhttp%253A%252F%252Fo.aolcdn.com%252Fhss%252Fstorage%252Fmidas%252F85a4e2b124ba329ab520e80e306f07eb%252F206517051%252FIMG_5243e.jpg%26client%3Da1acac3e1b3290917d92%26signature%3Dcea6158d0bf02768d31ee67f2694be6cafaf200c&client=amp-blogside-v2&signature=08a97a1109f1c3287c6766fa284104c6f78770fe
    

    编辑以抓取页面的所有新闻来源:

    虽然第一张图片有src属性,但要抓取后续图片,我们必须使用data-originals属性(您可以查看页面来源并找出这一点)。我认为这就是您收到 AttributeError 的原因

    我能够像这样抓取所有新闻条目

    import requests
    from bs4 import BeautifulSoup
    url='https://www.engadget.com/reviews/latest/page/10/'
    res=requests.get(url)
    soup=BeautifulSoup(res.text,'html.parser')
    articles=soup.find_all('article',{"class":"o-hit"})
    for article in articles:
        print("Heading: ", article.find('h2').text.strip())#heading
        print("Summary: ", article.find('p').text.strip())#summary
        print("Image Source:", article.find('img').attrs['data-original'])#image src
        print()
    

    输出:

    Heading:  Netflix will remove user reviews from its website next month
    Summary:  Last year five-star ratings got the ax, and now written reviews will fade away too.
    Image Source: https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fs.aolcdn.com%2Fhss%2Fstorage%2Fmidas%2F884e68f9a829f3a26db5b729f00ccd03%2F206508290%2FEnglish.jpg&client=amp-blogside-v2&signature=b37eb21e95cef8cebe1f3c741b8bb29eb3471dcc
    
    Heading:  Smart ForTwo Electric Drive quick spin review
    Summary:  The saddest way to spend $25,000.
    Image Source: https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fs.aolcdn.com%2Fhss%2Fstorage%2Fmidas%2Fedbdfdfeff2e77567cd0c4a73484d108%2F206502307%2Fsmartfortwo.jpg&client=amp-blogside-v2&signature=a9fc05d80d4b4d8ba6ef33453510c138632bab81
    
    Heading:  Vivo's all-screen NEX S is a frustrating glimpse of the future
    Summary:  Spoiler alert: It's really cool, but don't bother importing one.
    Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F29%2F5b36ac0e523dc352bd46785a%2F5b36aedc884c2354eb33d663_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=725c8033196a2ae3500e2144830d14b03e7abc0e
    
    Heading:  Sonos Beam review: Smart features trump minor audio compromises
    Summary:  Bringing the soundbar into the smart home era.
    Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F27%2F5b32f579523dc352bd3f66f3%2F5b32fbf2884c2354eb33d62f_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=4ad311aeb5cb23907fd99ec12d962b148646163d
    
    Heading:  BlackBerry KEY2 review: The undisputed keyboard king
    Summary:  This is the best Android-powered BlackBerry, if that means anything to you.
    Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F26%2F5b3188ee523dc36212a7ff02%2F5b318be5802b94347b7e586b_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=5438cdf814480be5856d38db73695f86ade186ea
    
    Heading:  Amazon Echo Look review: Good selfie taker, so-so stylist
    Summary:  An AI is no match for my style instincts.
    Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F25%2F5b30cbfce880db6107cb7ad0%2F5b30cde61aa5fc22c7bbf187_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=308e9f00afcb968da05823ce0d0718ccc6e43cb4
    
    Heading:  Mitsubishi’s Outlander Plug-In Hybrid is an understated surprise
    Summary:  Mitsubishi is back, even though it actually never left.
    Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F21%2F5b2bc80f523dc36212a2be79%2F5b2bc8a6884c2319c410c008_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=a00b8466fa281051de4d64b1223fe99f97315985
    
    Heading:  Amazon Fire TV Cube review: Alexa still needs work as a TV guide
    Summary:  This device was bound to be made at some point, but is it worth it?
    Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F21%2F5b2bb81edbaab36faf00ed0e%2F5b2bddfb884c2319c410c00c_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=baa2db64e12d013ab712d823238fc3efeee693f8
    
    Heading:  HTC U12+ review: Fundamentally flawed
    Summary:  The phone's pressure-sensitive power and volume keys are kinda the worst.
    Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F21%2F5b28cd94f50775726418990a%2F5b2bd7d4b46ab33c496c1607_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=8518ce5c141fb85b935794fbd3bd283d32508484
    

    【讨论】:

    • 嗨,Bitto,你能详细说明为什么你在类名中有“@m-”吗?
    • @Carlisle b'coz 这是类名。这不是我添加的东西。我去页面看了下源码,就是我得到的类名。
    • 我更新了代码...我实际上有一系列博客页面...看起来每个页面上的类名可能不同。我将尝试使用您的代码以使其适用于所有页面。
    【解决方案2】:
    from bs4 import BeautifulSoup
    import requests
    import time
    
    
    
    for page in range(1,11):
    
        url = 'https://www.engadget.com/reviews/latest/page/%s/' %(page)
        time.sleep(10)
    
        print ('Page: %s' %(page))    
        response = requests.get(url)
        soup = BeautifulSoup(response.text, 'html.parser')
    
        articles = soup.find_all('article',class_='o-hit')
    
        for article in articles:
    
            img_src = article.find('div',class_ ='o-rating_thumb c-white').img['data-original'] 
            headline = article.h2.text.strip()
            summary = article.find('p',class_ ='mt-15@m+ t-d5@m- t-d5@tp+ c-gray-3').text
    
            print(headline)
            print(summary)
            print(img_src)
            print('\n')
    

    输出:您可以将其写入 csv

    Page: 1
    Surface Studio 2 review: A better all-in-one PC twist
    But Microsoft could still go further.
    https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-12%2F20%2F5c1bfc61c0e0af2854a7c103%2F5c1bfcaa3278fb29ca5cf249_o_U_v1.jpg&client=amp-blogside-v2&signature=3c4be6997ee8e877ee7f62ad8d52409232f02ce9
    
    
    Nikon Z6 review: The best full-frame mirrorless camera for video
    10-bit external video, in-body stabilization and a full sensor readout.
    https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fcrop%3D1600%252C1111%252C0%252C0%26quality%3D85%26format%3Djpg%26resize%3D1600%252C1111%26image_uri%3Dhttps%253A%252F%252Fs.yimg.com%252Fos%252Fcreatr-uploaded-images%252F2018-12%252Ff152c7e0-045b-11e9-bfc7-4d357297511c%26client%3Da1acac3e1b3290917d92%26signature%3Dd3865a04724a29f29b2bd3f6941dcddf9d494bcc&client=amp-blogside-v2&signature=7a72e03b6995fc31e3415c68279a4b038c979ea8
    
    
    Brava's light-powered smart oven is too expensive to make sense
    Preset cook programs can be limiting as well. 
    https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-12%2F19%2F5c1a9bf4fcd67b52d409586b%2F5c1a9f935b5d1b6ddb3a80d6_o_U_v1.png&client=amp-blogside-v2&signature=5f899a54c84b54b651d90824b67587f29677c858
    
    
    PlayStation Classic review: A disappointing dose of nostalgia
    Sony learned nothing from Nintendo.
    https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fcrop%3D1600%252C928%252C0%252C0%26quality%3D85%26format%3Djpg%26resize%3D1600%252C928%26image_uri%3Dhttps%253A%252F%252Fs.yimg.com%252Fos%252Fcreatr-uploaded-images%252F2018-12%252F815fbe10-fd95-11e8-bde6-bdfd52a1c25a%26client%3Da1acac3e1b3290917d92%26signature%3D01ffdb2c7bc74497ae5f2a734feab08629996703&client=amp-blogside-v2&signature=98ae8e659929f4dd7f97d886d00b65303ab18059
    
    
    Moment's 58mm lens is a portrait machine
    The company's new tele lens fixes everything that was wrong with its 2014 model
    https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fresize%3D2000%252C2000%252Cshrink%26image_uri%3Dhttps%253A%252F%252Fs.yimg.com%252Fos%252Fcreatr-uploaded-images%252F2018-12%252F39511ab0-f8d1-11e8-bbae-a119d499ba30%26client%3Da1acac3e1b3290917d92%26signature%3Dc246fb260ca480d6a9a4acb6f91cf10974f32c9a&client=amp-blogside-v2&signature=7dd3cc98cb10045b323dba54e86f2c70c2aa99b4
    
    
    ’Super Smash Bros. Ultimate’ is the perfect nostalgia bomb
    It's a must-own for every Nintendo Switch owner.
    https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fcrop%3D1600%252C900%252C0%252C0%26quality%3D85%26format%3Djpg%26resize%3D1600%252C900%26image_uri%3Dhttps%253A%252F%252Fs.yimg.com%252Fos%252Fcreatr-uploaded-images%252F2018-12%252Fd8ab34b0-f90d-11e8-befe-815318929941%26client%3Da1acac3e1b3290917d92%26signature%3D4a5feb09e95c4f55ad0e1f8d6322734588ff76f6&client=amp-blogside-v2&signature=3a61128cae6560efafec7efc14211c2187851234
    
    
    Mercedes’ GLE sports impressive suspension technology
    MBUX and the new E-Active Body Control suspension enhance an already splendid SUV.
    https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fresize%3D2000%252C2000%252Cshrink%26image_uri%3Dhttps%253A%252F%252Fs.yimg.com%252Fos%252Fcreatr-uploaded-images%252F2018-12%252Fc86bc1f0-f7fa-11e8-b77f-844a0908350f%26client%3Da1acac3e1b3290917d92%26signature%3D26e4fe19d43ad8d8f349edc95baf1790e10deecd&client=amp-blogside-v2&signature=f74436bc107af53e468d2d9ece4e88b37cff0a10
    
    
    Mighty Vibe review: A much improved iPod Shuffle for Spotify
    The second-gen model makes some much-needed improvements.
    https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fcrop%3D1600%252C1067%252C0%252C0%26quality%3D85%26format%3Djpg%26resize%3D1600%252C1067%26image_uri%3Dhttps%253A%252F%252Fs.yimg.com%252Fos%252Fcreatr-uploaded-images%252F2018-11%252F1ffaaf40-f4e9-11e8-bbdf-b9d9c8fe5ee1%26client%3Da1acac3e1b3290917d92%26signature%3D18cbcc4c82d1c7d9f542dc80c57a4486c318526a&client=amp-blogside-v2&signature=e6d3badcf43ac938173f5353aa3a750c82b72bc3
    
    
    Google Pixel Slate review: The burden of bad software
    Back to the drawing board, Google.
    https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-11%2F30%2F5c008cf7600c9a1890e1305b%2F5c008d483a4f8c07678d8eb0_o_U_v1.jpg&client=amp-blogside-v2&signature=bbe180eb62cfe43f5241e74c1b7328c70da134c9
    
    
    Page: 2
    Dolby Dimension review: Excellent sound, exorbitant price
    At $599, these headphones are too expensive for most, no matter how good they are.
    https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fcrop%3D1600%252C1067%252C0%252C0%26quality%3D85%26format%3Djpg%26resize%3D1600%252C1067%26image_uri%3Dhttps%253A%252F%252Fs.yimg.com%252Fos%252Fcreatr-uploaded-images%252F2018-11%252Fc6b58350-f250-11e8-8fff-afce4122ee12%26client%3Da1acac3e1b3290917d92%26signature%3D1bbe23e198c73bba2cd252a57f0598bcc32b374b&client=amp-blogside-v2&signature=65f2ee40f3cb346e473d6b188a6939317b4bebad
    
    
    Nikon Z7 review: Great photos, great video, imperfect autofocus
    It’s a strong full-frame mirrorless debut.
    
    https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fcrop%3D1600%252C1019%252C0%252C0%26quality%3D85%26format%3Djpg%26resize%3D1600%252C1019%26image_uri%3Dhttps%253A%252F%252Fs.yimg.com%252Fos%252Fcreatr-uploaded-images%252F2018-11%252F07cabcc0-ed6f-11e8-af6d-2e14e29f20d0%26client%3Da1acac3e1b3290917d92%26signature%3D0a9704430483d81de5ebc6d4df50feca2228634e&client=amp-blogside-v2&signature=acf075d9ba15703a43abfcb705bd2ccd73e39ac7
    
    
    All of Amazon's new Echo speakers reviewed
    So how good do the new Echo Plus, Dot and Sub really sound?
    https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fcrop%3D1600%252C1067%252C0%252C0%26quality%3D85%26format%3Djpg%26resize%3D1600%252C1067%26image_uri%3Dhttps%253A%252F%252Fs.yimg.com%252Fos%252Fcreatr-uploaded-images%252F2018-11%252Fa1d1ba20-ed16-11e8-b9ad-ed849065b748%26client%3Da1acac3e1b3290917d92%26signature%3D55c2bb811c59d95f942f200aec6af5fb35d6e0fc&client=amp-blogside-v2&signature=3a5c7fdde7829033245cec14439a5463c077d702
    
    ... ... ...
    

    【讨论】:

    • 我更新了代码……抱歉,我循环浏览了几个博客页面。它似乎适用于第 1 页,但是当我将范围扩展到 4 页时,它会挂起。
    • 网址是什么?只是engadget.com/reviews/latest/page/10 然后10/ 从1 循环到4?
    • 你是什么意思挂起?运行但没有任何反应?或脚本退出/中断。我只是尝试延迟睡眠/时间。见上面的编辑
    • 好的。在上面添加了它。可能不需要像我那样延迟 10 秒(可能 5 秒就可以了)。但我只是运行它,它运行良好
    猜你喜欢
    • 2018-01-07
    • 2020-12-01
    • 2017-10-14
    • 2022-12-06
    • 2021-07-27
    • 2018-07-13
    • 2013-04-06
    • 2020-03-14
    • 2019-08-18
    相关资源
    最近更新 更多