获取图像 URL 以显示单个图像名称答案

【问题标题】：Get image Url to show one a single image name获取图像 URL 以显示单个图像名称
【发布时间】：2018-01-02 06:30:32
【问题描述】：

对此有疑问。我不知道如何去显示一个单一的img。例如：

<img srcset="http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s180/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg 180w, http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s390/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg 390w, http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s458/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg 458w" src="http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s615/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg">

正如您在上面看到的，有不同的替代图像，但是我正在尝试抓取一个要显示的图像。

import bs4 as bs
import urllib.request
import datetime
import random 
import re


random.seed(datetime.datetime.now())

sauce = urllib.request.urlopen('http://www.manchestereveningnews.co.uk/news/greater-manchester-news').read()
soup = bs.BeautifulSoup(sauce, 'lxml')

# 




title = soup.title
link = soup.link
image = re.search(img 'srcset=img(.*?),)  
 #this doesnt work, not sure how to 

strong = soup.strong
description = soup.description
location = soup.location


title = soup.find('h1', class_ ='publication-font', )   

image = soup.find('img')
strong = soup.find('strong')
location = soup.find('em').find('a')
description = soup.find('div', class_='description',to.text)


#Previous Code
print("H1:", title.text)
print("Article Link:", link)
print("Image Url:\n", image)
print("1st Paragraph:\n", strong.text)
print("2nd Paragraph:\n", description.string)
print("Location:\n", location.text)

我的代码在上面，但是在我之前的尝试中会显示之前的结果：

Greater Manchester News
<link href="rss.xml" rel="alternate" title="Default home feed" 

type="application/rss+xml"/>

<img data-`src="http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNA`TES/s615/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg" data-`srcset="http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTE`RNATES/s180/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg 180w,` http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALT`ERNATES/s

390/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-`Trafford-home-last-Thursday.jpg 390w, `http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s458/Mike-Grimshaw-34-was-fatally-attacked-following-t`he-attack-outs`ide-his-

Trafford-home-last-Thursday.jpg 458w"/>
        Family of dad stabbed in the neck while defendin

g his fiancée from thugs speak of their heartbreak
        Mike Grimshaw, 34, died after being stabbed in the neck outside his 

home in Trafford last Thursday

Trafford

在结果中，显示了多个图像名称，但是我试图只显示一个图像链接。我该怎么做。

任何想法将不胜感激。

【问题讨论】：

标签： python python-3.x scrape imageurl

【解决方案1】：

您可以访问属性data-src或data-srcset来获取您想要的图像：

image = soup.find('img')
single_img = image.get('data-src') # return the main image link

或

import re
image = soup.find('img')
img_string = image.get('data-srcset') # this return a string you have to parse 
img_set = re.findall(r'(https?://[^\s]+)', img_set) # regex to match only links

然后你就可以在 img_set 中访问你想要的任何索引（之前只需测试列表的长度）

【讨论】：