Python Scrapy检查xpath url图像是否存在答案

【问题标题】：Python Scrapy check xpath url image existsPython Scrapy检查xpath url图像是否存在
【发布时间】：2018-04-10 09:02:09
【问题描述】：

可以查看用xpath提取的图片的url吗？

这个想法是检查 url 是否损坏，如果没有，则提取图像 url，否则显示参考图像。

我有这个代码：

    for ntp in response.css('div.content-1col-nobox'):
        for imgurl in ntp.xpath('//div/p[3]/img/@src'):
            if imgurl != 404:
                picUrl = response.urljoin(ntp.xpath('//div/p[3]/img/@src').extract_first())
            else:
                picUrl = ("https://i.ebayimg.com/images/g/ZIMAAOSwImRYLZw9/s-l1600.jpg")

        writer.writerow({
        'PicURL': picUrl\})

任何帮助将不胜感激

【问题讨论】：

您可以尝试import requests并检查为if requests.get(imgurl).status_code != 404
我收到此错误：raise InvalidSchema("No connection adapters were found for '%s'" % url) requests.exceptions.InvalidSchema: No connection adapters were found for '<Selector xpath='//div/p[3]/img' data='<img style="max-width:620px;text-align:c'>
这是因为@src 看起来不像是绝对 URL（例如“http://somesite.com/images/image.jpg”）。你能提取正确的@src吗？
绝对url提取到picUrl中：response.urljoin(ntp.xpath('//div/p[3]/img/@src').extract_first())

标签： python url xpath scrapy

【解决方案1】：

@Andersson 是对的，问题出在绝对 URL 上。

这就是我所做的：

for ntp in response.css('div.content-1col-nobox'):
    imgUrl = response.urljoin(ntp.xpath('//div/p[3]/img/@src').extract_first()) 
    if requests.get(imgUrl).status_code != 404:
        picUrl = response.urljoin(ntp.xpath('//div/p[3]/img/@src').extract_first())
    else:
        picUrl = ("https://i.ebayimg.com/images/g/ZIMAAOSwImRYLZw9/s-l1600.jpg")
    writer.writerow({'PicURL': picUrl\})

编辑 @Andersson 感谢您的更正

【讨论】：

if/else 块缩进似乎不正确。是否应该与imgUrl 定义在同一缩进级别？