如何测试网页是否为图像答案

【问题标题】：How to test if a webpage is an image如何测试网页是否为图像
【发布时间】：2015-05-16 19:48:39
【问题描述】：

对不起，标题不是很清楚，基本上我有一个完整的一系列url的列表，打算下载那些是图片的。有没有办法检查网页是否是图片，这样我就可以跳过那些不是的？

提前致谢

【问题讨论】：

类似问题：stackoverflow.com/questions/14644880/…

【解决方案1】：

您可以使用requests 模块。发出头部请求并检查内容类型。头请求不会下载响应正文。

import requests
response = requests.head(url)
print response.headers.get('content-type')

【讨论】：

你可以get Content-Type header using only stdlib

【解决方案2】：

没有可靠的方法。但您可以找到一个对您而言可能“足够好”的解决方案。

如果文件扩展名出现在 url 中，您可以查看它，例如，.png、.jpg 可能表示图像：

>>> import os
>>> name = url2filename('http://example.com/a.png?q=1')
>>> os.path.splitext(name)[1]
'.png'
>>> import mimetypes
>>> mimetypes.guess_type(name)[0]
'image/png'

url2filename() function is defined here.

您可以检查Content-Type http 标头：

>>> import urllib.request
>>> r = urllib.request.urlopen(url) # make HTTP GET request, read headers
>>> r.headers.get_content_type()
'image/png'
>>> r.headers.get_content_maintype()
'image'
>>> r.headers.get_content_subtype()
'png'

您可以检查 http 正文的开头是否有指示图像文件的幻数，例如 jpeg may start with b'\xff\xd8\xff\xe0' 或：

>>> prefix = r.read(8)
>>> prefix # .png image
b'\x89PNG\r\n\x1a\n'

>>> import imghdr
>>> imghdr.what(None, b'\x89PNG\r\n\x1a\n')
'png'

【讨论】：

【解决方案3】：

你可以使用mimetypeshttps://docs.python.org/3.0/library/mimetypes.html

import urllib
from mimetypes import guess_extension

url="http://example.com/image.png"
source = urllib.urlopen(url)
extension = guess_extension(source.info()['Content-Type'])
print extension

这将返回“png”

【讨论】：

它不适用于 Python 3（问题有python-3.x 标签）
如果你修复了导入，你可以让它工作。另外，不清楚为什么要在这里猜测文件扩展名。 Content-Type 本身就很清楚：它甚至可能包含“图像”这个词（您可以提取它，如my answer 所示）