【问题标题】:Get image Url to show one a single image name获取图像 URL 以显示单个图像名称
【发布时间】:2018-01-02 06:30:32
【问题描述】:

对此有疑问。我不知道如何去显示一个单一的img。例如:

<img srcset="http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s180/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg 180w, http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s390/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg 390w, http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s458/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg 458w" src="http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s615/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg">

正如您在上面看到的,有不同的替代图像,但是我正在尝试抓取一个要显示的图像。

import bs4 as bs
import urllib.request
import datetime
import random 
import re


random.seed(datetime.datetime.now())

sauce = urllib.request.urlopen('http://www.manchestereveningnews.co.uk/news/greater-manchester-news').read()
soup = bs.BeautifulSoup(sauce, 'lxml')

# 




title = soup.title
link = soup.link
image = re.search(img 'srcset=img(.*?),)  
 #this doesnt work, not sure how to 

strong = soup.strong
description = soup.description
location = soup.location


title = soup.find('h1', class_ ='publication-font', )   

image = soup.find('img')
strong = soup.find('strong')
location = soup.find('em').find('a')
description = soup.find('div', class_='description',to.text)


#Previous Code
print("H1:", title.text)
print("Article Link:", link)
print("Image Url:\n", image)
print("1st Paragraph:\n", strong.text)
print("2nd Paragraph:\n", description.string)
print("Location:\n", location.text)

我的代码在上面,但是在我之前的尝试中会显示之前的结果:

Greater Manchester News
<link href="rss.xml" rel="alternate" title="Default home feed" 

type="application/rss+xml"/>

<img data-`src="http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNA`TES/s615/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg" data-`srcset="http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTE`RNATES/s180/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg 180w,` http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALT`ERNATES/s

390/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-`Trafford-home-last-Thursday.jpg 390w, `http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s458/Mike-Grimshaw-34-was-fatally-attacked-following-t`he-attack-outs`ide-his-

Trafford-home-last-Thursday.jpg 458w"/>
        Family of dad stabbed in the neck while defendin

g his fiancée from thugs speak of their heartbreak
        Mike Grimshaw, 34, died after being stabbed in the neck outside his 

home in Trafford last Thursday

Trafford

在结果中,显示了多个图像名称,但是我试图只显示一个图像链接。我该怎么做。

任何想法将不胜感激。

【问题讨论】:

    标签: python python-3.x scrape imageurl


    【解决方案1】:

    您可以访问属性data-srcdata-srcset来获取您想要的图像:

    image = soup.find('img')
    single_img = image.get('data-src') # return the main image link
    

    import re
    image = soup.find('img')
    img_string = image.get('data-srcset') # this return a string you have to parse 
    img_set = re.findall(r'(https?://[^\s]+)', img_set) # regex to match only links
    

    然后你就可以在 img_set 中访问你想要的任何索引(之前只需测试列表的长度)

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-05-25
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多