【问题标题】：Get image size without downloading it in Python无需在 Python 中下载即可获取图像大小
【发布时间】：2011-11-19 14:05:20
【问题描述】：

如何在不实际下载的情况下获得图像的尺寸？甚至可能吗？我有一个图像 url 列表，我想为其分配宽度和大小。

我知道有一种方法可以在本地 (How to check dimensions of all images in a directory using python?)，但我不想下载所有图像。

编辑：

以下编辑。建议，我编辑了代码。我想出了this code。不确定它会下载整个文件还是只下载一部分（如我所愿）。

【问题讨论】：

它通常是文件开头的一些标题，因此您只能下载几个字节。例如6 个字节足以获得 jpeg 的尺寸：fastgraph.com/help/jpeg_header_format.html

标签： python image url

【解决方案1】：

我在this site 上找到了运行良好的解决方案：

import urllib
import ImageFile

def getsizes(uri):
    # get file size *and* image size (None if not known)
    file = urllib.urlopen(uri)
    size = file.headers.get("content-length")
    if size: size = int(size)
    p = ImageFile.Parser()
    while 1:
        data = file.read(1024)
        if not data:
            break
        p.feed(data)
        if p.image:
            return size, p.image.size
            break
    file.close()
    return size, None

print getsizes("http://www.pythonware.com/images/small-yoyo.gif")
# (10965, (179, 188))

【讨论】：

ImageFile 来自哪里？
PIL -- python 成像库
注意这段代码中的文件描述符：如果检索到图像大小，则文件没有关闭。
你能详细说明@IvanDePazCenteno 吗？返回大小之前的“file.close（）”不会解决这个问题吗？而且，这甚至是一个问题吗？
@FabianBosler 是的，在该行之前添加一个file.close() 可以解决问题，尽管我建议使用with 关键字来管理它，因为它是good practice。肯定是的，这是个问题。在某些情况下，例如在大循环中，不关闭分配的资源可能会成为一场灾难。分配的资源应始终关闭，即使操作系统或解释器本身可以摆脱它。

【解决方案2】：

这是基于 ed 的回答以及我在网上找到的其他内容。我遇到了与 .read(24) 的 grotos 相同的问题。从here下载getimageinfo.py，从here下载ReSeekFile.py。

import urllib2
imgdata = urllib2.urlopen(href)
image_type,width,height = getimageinfo.getImageInfo(imgdata)

这样修改getimageinfo...

import ReseekFile

def getImageInfo(datastream):
    datastream = ReseekFile.ReseekFile(datastream)
    data = str(datastream.read(30))

#Skipping to jpeg

# handle JPEGs
elif (size >= 2) and data.startswith('\377\330'):
    content_type = 'image/jpeg'
    datastream.seek(0)
    datastream.read(2)
    b = datastream.read(1)
    try:
        while (b and ord(b) != 0xDA):
            while (ord(b) != 0xFF): b = datastream.read(1)
            while (ord(b) == 0xFF): b = datastream.read(1)
            if (ord(b) >= 0xC0 and ord(b) <= 0xC3):
                datastream.read(3)
                h, w = struct.unpack(">HH", datastream.read(4))
                break
            else:
                datastream.read(int(struct.unpack(">H", datastream.read(2))[0])-2)
            b = datastream.read(1)
        width = int(w)
        height = int(h)
    except struct.error:
        pass
    except ValueError:
        pass

【讨论】：

干得好。我也遇到了同样的问题，来自 ed 的其他有用的回复
getimageinfo.py 的源代码不再可用。以下是供将来寻找它的任何人使用的代码：gist.github.com/bmamouri/55ac6bfa7ba5eee03da2eb9e4f7469d9

【解决方案3】：

这只是早期答案here 的 Python 3+ 改编版本。

from urllib import request as ulreq
from PIL import ImageFile
 
def getsizes(uri):
    # get file size *and* image size (None if not known)
    file = ulreq.urlopen(uri)
    size = file.headers.get("content-length")
    if size: 
        size = int(size)
    p = ImageFile.Parser()
    while True:
        data = file.read(1024)
        if not data:
            break
        p.feed(data)
        if p.image:
            return size, p.image.size
            break
    file.close()
    return(size, None)

【讨论】：

【解决方案4】：

如果您愿意下载每个文件的前 24 个字节，那么 this function（在 johnteslade 对您提到的问题的回答中提到）将计算出尺寸。

这可能是完成所需工作所需的最少下载量。

import urllib2
start = urllib2.urlopen(image_url).read(24)

编辑（1）：

对于 jpeg 文件，它似乎需要更多字节。您可以编辑该函数，而不是读取 StringIO.StringIO(data)，而是从 urlopen 读取文件句柄。然后它会根据需要准确读取图像的宽度和高度。

【讨论】：

使用此解决方案，尤其是 .read(24)，会破坏该脚本。当我使用 read() 时一切正常。
它与python文档（docs.python.org/library/urllib2.html）中的示例基本相同。使用 (24) 会出现什么错误？只需使用 read() （我猜你知道）将下载整个文件...
如果我使用 read(24) 运行 getImageInfo 函数会出现一些错误：UnboundLocalError: local variable 'w' referenced before assignment
嗯。尝试使用 read(50) 运行它，看看它是否有效。我认为错误一定来自函数的 jpeg 部分，所以它可能需要更多字节。
是一样的。我认为它只适用于 read(X)，其中 X 太大以至于它覆盖了整个文件。

【解决方案5】：

由于上面提到的getimageinfo.py 在 Python3 中不起作用。用枕头代替它。

Pillow 可以在 pypi 中找到，或者使用 pip 安装：pip install pillow。

从 io 导入 BytesIO 从 PIL 导入图像导入请求 hrefs = ['https://farm4.staticflickr.com/3894/15008518202_b016d7d289_m.jpg','https://farm4.staticflickr.com/3920/15008465772_383e697089_m.jpg','https://farm4.staticflickr.com/ 3902/14985871946_86abb8c56f_m.jpg'] 范围 = 5000 对于hrefs中的href： req = requests.get(href,headers={'User-Agent':'Mozilla5.0(Google spider)','Range':'bytes=0-{}'.format(RANGE)}) im = Image.open(BytesIO(req.content)) 打印（im.size）

【讨论】：

这不是真的下载图片吗？我相信这就是 OP 试图避免的
我建议使用请求的会话来重用 TCP 连接，并尽可能使用纯 HTTP 而不是 HTTPS。在某些情况下，它可能会显着提高性能。

【解决方案6】：

通过使用requests 库：

以字节为单位获取图像大小：

仅通过从网站获取标题数据：（无需下载图像）

import requests

url = r"https://www.sulitest.org/files/source/Big%20image%20HD/elyx.png"

size = requests.get(url, stream = True).headers['Content-length']
print(size)
## output: 437495

## to see what other headers data you can get:
allheaders = requests.get(url, stream = True).headers
print(allheaders)

获取图片（宽度、高度）：

我们必须下载部分图像，让图像库读取图像标题并检索/解析（宽度，高度）。这里我使用Pillow。

import requests
from PIL import ImageFile

resume_header = {'Range': 'bytes=0-2000000'}    ## the amount of bytes you will download
data = requests.get(url, stream = True, headers = resume_header).content

p = ImageFile.Parser()
p.feed(data)    ## feed the data to image parser to get photo info from data headers
if p.image:
    print(p.image.size) ## get the image size (Width, Height)
## output: (1400, 1536)

【讨论】：

这很有魅力！

【解决方案7】：

无法直接执行此操作，但有一种解决方法。如果服务器上存在文件，则实现以图像名称作为参数并返回大小的 API 端点。

但如果文件在不同的服务器上，你别无他法，只能下载文件。

【讨论】：

根据这个问题的另一个答案，这似乎是一个不正确的断言。
@SlaterTyranus 不，所有其他答案只是建议下载图像（或图像的一部分）。这个答案是最正确的，但其他都是有效的解决方法。

【解决方案8】：

很遗憾，我无法发表评论，所以这是一个答案：

使用带有标题的 get 查询

"Range": "bytes=0-30"

然后简单地使用

http://code.google.com/p/bfg-pages/source/browse/trunk/pages/getimageinfo.py

如果你使用python的“请求”，那就简单了

r = requests.get(image_url, headers={
    "Range": "bytes=0-30"
})
image_info = get_image_info(r.content)

这修复了 ed. 的答案，并且没有任何其他依赖项（如 ReSeekFile.py）。

【讨论】：

提供的网址无效

【解决方案9】：

我的固定“getimageInfo.py”，用 Python 3.4+ 工作，试试吧，太好了！

import io
import struct
import urllib.request as urllib2

def getImageInfo(data):
    data = data
    size = len(data)
    #print(size)
    height = -1
    width = -1
    content_type = ''

    # handle GIFs
    if (size >= 10) and data[:6] in (b'GIF87a', b'GIF89a'):
        # Check to see if content_type is correct
        content_type = 'image/gif'
        w, h = struct.unpack(b"<HH", data[6:10])
        width = int(w)
        height = int(h)

    # See PNG 2. Edition spec (http://www.w3.org/TR/PNG/)
    # Bytes 0-7 are below, 4-byte chunk length, then 'IHDR'
    # and finally the 4-byte width, height
    elif ((size >= 24) and data.startswith(b'\211PNG\r\n\032\n')
          and (data[12:16] == b'IHDR')):
        content_type = 'image/png'
        w, h = struct.unpack(b">LL", data[16:24])
        width = int(w)
        height = int(h)

    # Maybe this is for an older PNG version.
    elif (size >= 16) and data.startswith(b'\211PNG\r\n\032\n'):
        # Check to see if we have the right content type
        content_type = 'image/png'
        w, h = struct.unpack(b">LL", data[8:16])
        width = int(w)
        height = int(h)

    # handle JPEGs
    elif (size >= 2) and data.startswith(b'\377\330'):
        content_type = 'image/jpeg'
        jpeg = io.BytesIO(data)
        jpeg.read(2)
        b = jpeg.read(1)
        try:
            while (b and ord(b) != 0xDA):
                while (ord(b) != 0xFF): b = jpeg.read(1)
                while (ord(b) == 0xFF): b = jpeg.read(1)
                if (ord(b) >= 0xC0 and ord(b) <= 0xC3):
                    jpeg.read(3)
                    h, w = struct.unpack(b">HH", jpeg.read(4))
                    break
                else:
                    jpeg.read(int(struct.unpack(b">H", jpeg.read(2))[0])-2)
                b = jpeg.read(1)
            width = int(w)
            height = int(h)
        except struct.error:
            pass
        except ValueError:
            pass

    return content_type, width, height



#from PIL import Image
#import requests
#hrefs = ['http://farm4.staticflickr.com/3894/15008518202_b016d7d289_m.jpg','https://farm4.staticflickr.com/3920/15008465772_383e697089_m.jpg','https://farm4.staticflickr.com/3902/14985871946_86abb8c56f_m.jpg']
#RANGE = 5000
#for href in hrefs:
    #req  = requests.get(href,headers={'User-Agent':'Mozilla5.0(Google spider)','Range':'bytes=0-{}'.format(RANGE)})
    #im = getImageInfo(req.content)

    #print(im)
req = urllib2.Request("http://vn-sharing.net/forum/images/smilies/onion/ngai.gif", headers={"Range": "5000"})
r = urllib2.urlopen(req)
#f = open("D:\\Pictures\\1.jpg", "rb")
print(getImageInfo(r.read()))
# Output: >> ('image/gif', 50, 50)
#print(getImageInfo(f.read()))

源码：http://code.google.com/p/bfg-pages/source/browse/trunk/pages/getimageinfo.py

【讨论】：

嘿，这很有趣。只是好奇 RANGE 变量在做什么......这会限制下载多少字节吗？
这对我来说不适用于许多 JPEG。我找到了一个替代功能（作为答案发布）
无法处理许多文件。前 5Kb 是不够的

【解决方案10】：

import requests
from PIL import Image
from io import BytesIO

url = 'http://farm4.static.flickr.com/3488/4051378654_238ca94313.jpg'

img_data = requests.get(url).content    
im = Image.open(BytesIO(img_data))
print (im.size)

【讨论】：

部分问题是“没有实际下载”。 requests.get(url).content 将下载图片。