如何在 PIL 中计算 Image.resize() 的结果文件大小答案

【问题标题】：How to calculate the resulting filesize of Image.resize() in PIL如何在 PIL 中计算 Image.resize() 的结果文件大小
【发布时间】：2021-06-01 23:03:56
【问题描述】：

我必须将传入的文件减少到最大 1MB。我使用PIL 进行图像操作和python 3.5。图像的文件大小由下式给出：

import os
src = 'testfile.jpg'
os.path.getsize(src)
print(src)

在我的情况下是 1531494 如果我用 PIL 打开文件，我只能得到尺寸：

from PIL import Image
src = 'testfile.jpg'
image = Image.open(src)
size =  image.size
print(size)

在我的情况下是 (1654, 3968)

当然，我可以用不同的大小对文件进行如下循环，保存文件并检查其文件大小。但是必须有一个更简单的方法，因为这需要太多时间。（如果你缩小 1000 个不同大小的文件）

def resize_image(src, reduceby=1):
    '''
    resizes image by percent given in reduceby
    '''
    print(" process_image:",src, reduceby)
    org = Image.open(src)
    real_size = org.size
    reduced_size = (int(real_size[0] * reduceby / 100),int(real_size[1] * reduceby / 100) )
    org.resize(reduced_size, Image.ANTIALIAS)
    reduced_file = src[:-4] +"_" + str(reduceby) + src[-4:]
    org.save(reduced_file, optimize=True)
    print(" reduced_image:", reduced_file)
    reduced_filesize = os.path.getsize(reduced_file)
    return reduced_filesize, reduced_file

def loop_image(src, target_size):
    print("loop_image    :", src, target_size)
    file_size = os.path.getsize(src)
    reduced_file =src
    print("source        :", src, file_size)
    reduce_by = 1
    while file_size > target_size:
        file_size, reduced_file = resize_image(src, reduce_by)
        print("target       :", file_size, reduced_file)
        reduce_by += 1
    return reduced_file

此功能有效，但它减少了太多并且花费了太多时间。我的问题是：我如何计算生成的文件大小之前我调整它？还是有更简单的方法？

【问题讨论】：

您是否只想保持文件的比例，但又小到足以容纳 1 MB？
你可以使用io.BytesIO 来做到这一点"in-memory" 就像我在这里做的stackoverflow.com/a/52281257/2836621 显然你会减少边的长度而不是降低质量，但原理是一样的，代码使用了二分查找，速度更快。
@Thymen：是的，我必须...因为在一个只支持 1MB 大小的程序中需要图像。传入文件的大小在 0.8 到 2.9 MB 之间 @Mark：这是我以前从未见过的，谢谢。但我已经对质量选项im.save(buffer, format="JPEG", quality=m) 进行了实验。我认为调整大小会带来更好的结果。

标签： python-3.x python-imaging-library

【解决方案1】：

长话短说，你不知道图像会被压缩到什么程度，因为这在很大程度上取决于它是什么类型的图像。也就是说，我们可以优化您的代码。

一些优化：

使用内存大小和图像宽度估算每个像素的字节数。
根据新的内存消耗和旧的内存消耗执行更新比率。

我的编码解决方案同时应用了上述两种方法，因为单独应用它们似乎不会导致非常稳定的收敛。以下部分将更深入地解释这两个部分并展示我考虑过的测试用例。

减少图像内存

以下代码根据原始文件大小（以字节为单位）和首选文件大小（以字节为单位）之间的差异来近似新的图像尺寸。它将近似每个像素的字节数，然后在图像宽度和高度上应用每个像素的原始字节数和每个像素的首选字节数之间的差异（因此取平方根）。

然后我使用opencv-python (cv2) 进行图像重新缩放，但这可以通过您的代码进行更改。

def reduce_image_memory(path, max_file_size: int = 2 ** 20):
    """
        Reduce the image memory by downscaling the image.

        :param path: (str) Path to the image
        :param max_file_size: (int) Maximum size of the file in bytes
        :return: (np.ndarray) downscaled version of the image
    """
    image = cv2.imread(path)
    height, width = image.shape[:2]

    original_memory = os.stat(path).st_size
    original_bytes_per_pixel = original_memory / np.product(image.shape[:2])

    # perform resizing calculation
    new_bytes_per_pixel = original_bytes_per_pixel * (max_file_size / original_memory)
    new_bytes_ratio = np.sqrt(new_bytes_per_pixel / original_bytes_per_pixel)
    new_width, new_height = int(new_bytes_ratio * width), int(new_bytes_ratio * height)

    new_image = cv2.resize(image, (new_width, new_height), interpolation=cv2.INTER_LINEAR_EXACT)
    return new_image

申请比例

大部分魔法发生在ratio *= max_file_size / new_memory，我们计算与首选尺寸相关的误差，并使用该值纠正我们的比率。

程序将搜索满足以下条件的比率：

abs(1 - max_file_size / new_memory) > max_deviation_percentage

这意味着新文件大小必须相对接近首选文件大小。您可以通过delta 控制此亲密度。增量越高，您的文件可以越小（低于max_file_size）。增量越小，新文件大小就越接近max_file_size，但永远不会变大。

的交易是及时的，delta 越小，找到满足条件的比率所需的时间就越多，经验测试表明 0.01 和 0.05 之间的值是好的。

if __name__ == '__main__':
    image_location = "test img.jpg"

    # delta denotes the maximum variation allowed around the max_file_size
    # The lower the delta the more time it takes, but the close it will be to `max_file_size`.
    delta = 0.01
    max_file_size = 2 ** 20 * (1 - delta)
    max_deviation_percentage = delta

    current_memory = new_memory = os.stat(image_location).st_size
    ratio = 1
    steps = 0

    # make sure that the comparison is within a certain deviation.
    while abs(1 - max_file_size / new_memory) > max_deviation_percentage:
        new_image = reduce_image_memory(image_location, max_file_size=max_file_size * ratio)
        cv2.imwrite(f"resize {image_location}", new_image)

        new_memory = os.stat(f"resize {image_location}").st_size
        ratio *= max_file_size / new_memory
        steps += 1

    print(f"Memory resize: {current_memory / 2 ** 20:5.2f}, {new_memory / 2 ** 20:6.4f} MB, number of steps {steps}")

测试用例

为了测试，我有两种不同的方法，使用随机生成的图像和来自 google 的示例。

对于随机图像，我使用了以下代码

def generate_test_image(ratio: Tuple[int, int], file_size: int) -> Image:
    """
        Generate a test image with fixed width height ratio and an approximate size.

        :param ratio: (Tuple[int, int]) screen ratio for the image
        :param file_size: (int) Approximate size of the image, note that this may be off due to image compression.
    """
    height, width = ratio  # Numpy reverse values
    scale = np.int(np.sqrt(file_size // (width * height)))
    img = np.random.randint(0, 255, (width * scale, height * scale, 3), dtype=np.uint8)
    return img

结果

使用随机生成的图像

image_location = "test image random.jpg"
# Generate a large image with fixed ratio and a file size of ~1.7MB
image = generate_test_image(ratio=(16, 9), file_size=1531494)
cv2.imwrite(image_location, image)

内存调整大小：1.71、0.99 MB、步数 2

分两步，它将原始大小从 1.7 MB 减少到 0.99 MB。

（之前）

（之后）

使用谷歌图片

内存调整大小：1.51，0.996 MB，步数 4

它通过 4 个步骤将原始大小从 1.51 MB 减少到 0.996 MB。

（之前）

（之后）

奖金

它也适用于.png、.jpeg、.tiff 等...
除了缩小之外，它还可以用于将图像放大到一定的内存消耗。
尽可能保持图像比例。

编辑

我使代码更加用户友好，并使用io.Buffer 添加了来自Mark Setchell 的建议，这大致将代码加速了2 倍。还有一个step_limit，可以防止无休止如果 delta 非常小，则循环。

import io
import os
import time
from typing import Tuple

import cv2
import numpy as np
from PIL import Image


def generate_test_image(ratio: Tuple[int, int], file_size: int) -> Image:
    """
        Generate a test image with fixed width height ratio and an approximate size.

        :param ratio: (Tuple[int, int]) screen ratio for the image
        :param file_size: (int) Approximate size of the image, note that this may be off due to image compression.
    """
    height, width = ratio  # Numpy reverse values
    scale = np.int(np.sqrt(file_size // (width * height)))
    img = np.random.randint(0, 255, (width * scale, height * scale, 3), dtype=np.uint8)
    return img


def _change_image_memory(path, file_size: int = 2 ** 20):
    """
        Tries to match the image memory to a specific file size.

        :param path: (str) Path to the image
        :param file_size: (int) Size of the file in bytes
        :return: (np.ndarray) rescaled version of the image
    """
    image = cv2.imread(path)
    height, width = image.shape[:2]

    original_memory = os.stat(path).st_size
    original_bytes_per_pixel = original_memory / np.product(image.shape[:2])

    # perform resizing calculation
    new_bytes_per_pixel = original_bytes_per_pixel * (file_size / original_memory)
    new_bytes_ratio = np.sqrt(new_bytes_per_pixel / original_bytes_per_pixel)
    new_width, new_height = int(new_bytes_ratio * width), int(new_bytes_ratio * height)

    new_image = cv2.resize(image, (new_width, new_height), interpolation=cv2.INTER_LINEAR_EXACT)
    return new_image


def _get_size_of_image(image):
    # Encode into memory and get size
    buffer = io.BytesIO()
    image = Image.fromarray(image)
    image.save(buffer, format="JPEG")
    size = buffer.getbuffer().nbytes
    return size


def limit_image_memory(path, max_file_size: int, delta: float = 0.05, step_limit=10):
    """
        Reduces an image to the required max file size.

        :param path: (str) Path to the original (unchanged) image.
        :param max_file_size: (int) maximum size of the image
        :param delta: (float) maximum allowed variation from the max file size.
            This is a value between 0 and 1, relatively to the max file size.
        :return: an image path to the limited image.
    """
    start_time = time.perf_counter()
    max_file_size = max_file_size * (1 - delta)
    max_deviation_percentage = delta
    new_image = None

    current_memory = new_memory = os.stat(image_location).st_size
    ratio = 1
    steps = 0

    while abs(1 - max_file_size / new_memory) > max_deviation_percentage:
        new_image = _change_image_memory(path, file_size=max_file_size * ratio)
        new_memory = _get_size_of_image(new_image)
        ratio *= max_file_size / new_memory
        steps += 1

        # prevent endless looping
        if steps > step_limit:  break

    print(f"Stats:"
          f"\n\t- Original memory size: {current_memory / 2 ** 20:9.2f} MB"
          f"\n\t- New memory size     : {new_memory / 2 ** 20:9.2f} MB"
          f"\n\t- Number of steps {steps}"
          f"\n\t- Time taken: {time.perf_counter() - start_time:5.3f} seconds")

    if new_image is not None:
        cv2.imwrite(f"resize {path}", new_image)
        return f"resize {path}"
    return path


if __name__ == '__main__':
    image_location = "your nice image.jpg"

    # Uncomment to generate random test images
    # test_image = generate_test_image(ratio=(16, 9), file_size=1567289)
    # cv2.imwrite(image_location, test_image)

    path = limit_image_memory(image_location, max_file_size=2 ** 20, delta=0.01)

【讨论】：

您不需要将文件实际写入物理磁盘然后stat() 它 - 您可以使用io.BytesIO 写入内存，就像我在上面的评论中一样，然后获取文件的大小内存缓冲区。当您获得正确的大小时，您也可以将内存缓冲区直接写入磁盘，而无需再次对其进行 JPEG 编码。希望这两个都能让它更快:-)
让我看看我是否可以实现这些添加:)
他们只是想法/建议 - 你已经有我的投票 :-)
我实现了那部分，现在代码的速度大约是原来的两倍，但我认为结果对于更大的图像会更明显。
非常感谢您的解释，太好了。这真是一个绝妙的解决方案。