【发布时间】:2018-01-02 06:30:32
【问题描述】:
对此有疑问。我不知道如何去显示一个单一的img。例如:
<img srcset="http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s180/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg 180w, http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s390/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg 390w, http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s458/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg 458w" src="http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s615/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg">
正如您在上面看到的,有不同的替代图像,但是我正在尝试抓取一个要显示的图像。
import bs4 as bs
import urllib.request
import datetime
import random
import re
random.seed(datetime.datetime.now())
sauce = urllib.request.urlopen('http://www.manchestereveningnews.co.uk/news/greater-manchester-news').read()
soup = bs.BeautifulSoup(sauce, 'lxml')
#
title = soup.title
link = soup.link
image = re.search(img 'srcset=img(.*?),)
#this doesnt work, not sure how to
strong = soup.strong
description = soup.description
location = soup.location
title = soup.find('h1', class_ ='publication-font', )
image = soup.find('img')
strong = soup.find('strong')
location = soup.find('em').find('a')
description = soup.find('div', class_='description',to.text)
#Previous Code
print("H1:", title.text)
print("Article Link:", link)
print("Image Url:\n", image)
print("1st Paragraph:\n", strong.text)
print("2nd Paragraph:\n", description.string)
print("Location:\n", location.text)
我的代码在上面,但是在我之前的尝试中会显示之前的结果:
Greater Manchester News
<link href="rss.xml" rel="alternate" title="Default home feed"
type="application/rss+xml"/>
<img data-`src="http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNA`TES/s615/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg" data-`srcset="http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTE`RNATES/s180/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg 180w,` http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALT`ERNATES/s
390/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-`Trafford-home-last-Thursday.jpg 390w, `http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s458/Mike-Grimshaw-34-was-fatally-attacked-following-t`he-attack-outs`ide-his-
Trafford-home-last-Thursday.jpg 458w"/>
Family of dad stabbed in the neck while defendin
g his fiancée from thugs speak of their heartbreak
Mike Grimshaw, 34, died after being stabbed in the neck outside his
home in Trafford last Thursday
Trafford
在结果中,显示了多个图像名称,但是我试图只显示一个图像链接。我该怎么做。
任何想法将不胜感激。
【问题讨论】:
标签: python python-3.x scrape imageurl