【问题标题】：Faster way to iterarate over numpy array迭代numpy数组的更快方法
【发布时间】：2022-01-17 07:06:04
【问题描述】：

我有一个非常大的数组，我需要对其进行迭代。该数组是一个大的 tiff 图像，而不是一个颜色值，我想添加一个 4-4 位 2d 模式。我的代码现在看起来像这样。它需要很长时间才能完成。 tiff 是一个数组（x，y，4）。 x 和 y 非常大。 Values 是一个列表，其中包含与我正在搜索的值匹配并为我提供索引的模式。模式是 4-4 个模式的数组。谢谢

for iy, ix in np.ndindex(tiff[:, :, 0].shape):
    tiff[iy, ix, 0] = np.random.choice(np.argwhere(Values == tiff[iy, ix, 0])[:, 0], 1, replace=False)
    tiff[iy, ix, 1] = np.random.choice(np.argwhere(Values == tiff[iy, ix, 1])[:, 0], 1, replace=False)
    tiff[iy, ix, 2] = np.random.choice(np.argwhere(Values == tiff[iy, ix, 2])[:, 0], 1, replace=False)
    tiff[iy, ix, 3] = np.random.choice(np.argwhere(Values == tiff[iy, ix, 3])[:, 0], 1, replace=False)
    Rippedimage[iy * 8 : (iy + 1) * 8 - 4, ix * 8 : (ix + 1) * 8 - 4] = Array_Pattern_4_4[tiff[iy, ix, 0]]
    Rippedimage[iy * 8 : (iy + 1) * 8 - 4, ix * 8 + 4 : (ix + 1) * 8] = Array_Pattern_4_4[tiff[iy, ix, 1]]
    Rippedimage[iy * 8 + 4 : (iy + 1) * 8, ix * 8 : (ix + 1) * 8 - 4] = Array_Pattern_4_4[tiff[iy, ix, 2]]
    Rippedimage[iy * 8 + 4 : (iy + 1) * 8, ix * 8 + 4 : (ix + 1) * 8] = Array_Pattern_4_4[tiff[iy, ix, 3]]

left is before, right how it should look like after

【问题讨论】：

与其说“我需要遍历这个数组”，不如告诉我们你的最终目标是什么？这是什么模式？为什么 TIFF 是 4 通道？
谢谢，我添加了一张图片来说明我的意图。 tiff 被分成 4 个通道，因为我想要一个 8-8 模式，它被分成 4 个 4-4 子模式。
那么这是灰度图像的某种抖动吗？听起来您可能有更好的时间将抖动版本复制到新数组中？
replace 如果您的 size 为 1，则有点没有意义，不是吗？
什么是Values？ tiff 到底是什么？他们持有哪些类型的数字？我能否帮助您将其矢量化取决于这样的花絮。

标签： python numpy performance for-loop iterator

【解决方案1】：

说实话有点难以说出你真正在寻找什么，但这里有一些代码：

为每个灰度阴影生成各种 NxN 随机抖动模式（假设为 8 位图像）
在原始图像中为每 NxN 像素选择一个随机图案以生成抖动版本

在我的 Macbook 上，抖动 920x920 图像大约需要 17 毫秒：

image generation 4.377
pattern generation 6.06
dither generation 16.915

import time
from contextlib import contextmanager

import numpy as np
from PIL import Image


def generate_patterns(
    *,
    pattern_size: int = 8,
    pattern_options_per_shade: int = 8,
    shades: int = 256,
):
    patterns = []
    for shade in range(shades):
        shade_patterns = [
            np.random.random((pattern_size, pattern_size)) < (shade / shades)
            for i in range(pattern_options_per_shade)
        ]
        patterns.append(shade_patterns)
    return np.array(patterns)


def dither(image, patterns):
    (
        shades,
        pattern_options_per_shade,
        pattern_width,
        pattern_height,
    ) = patterns.shape
    assert shades == 256  # TODO

    # image sampled at pattern_sizes
    resampled = (
        image[::pattern_width, ::pattern_height].round().astype(np.uint8)
    )
    # mask of pattern option per pattern_size block
    pat_mask = np.random.randint(
        0, pattern_options_per_shade, size=resampled.shape
    )

    dithered = np.zeros_like(image)
    for (iy, ix), c in np.ndenumerate(resampled):
        pattern = patterns[c, pat_mask[iy, ix]]
        dithered[
            iy * pattern_height : (iy + 1) * pattern_height,
            ix * pattern_width : (ix + 1) * pattern_width,
        ] = pattern

    return dithered * 255


@contextmanager
def stopwatch(title):
    t0 = time.perf_counter()
    yield
    t1 = time.perf_counter()
    print(title, round((t1 - t0) * 1000, 3))


def main():
    with stopwatch("image generation"):
        img_size = 920
        image = (
            np.linspace(0, 255, img_size)
            .repeat(img_size)
            .reshape((img_size, img_size))
        )
        image[200:280, 200:280] = 0

    with stopwatch("pattern generation"):
        patterns = generate_patterns()

    with stopwatch("dither generation"):
        dithered = dither(image, patterns)

    import matplotlib.pyplot as plt

    plt.figure(dpi=450)
    plt.imshow(dithered, interpolation="none")
    plt.show()


if __name__ == "__main__":
    main()

输出图像看起来像（例如）

编辑

将源图像升级为抖动版本的版本：

image generation 3.886
pattern generation 5.581
dither generation 1361.194

def dither_embiggen(image, patterns):
    shades, pattern_options_per_shade, pattern_width, pattern_height = patterns.shape
    assert shades == 256  # TODO

    # mask of pattern option per source pixel
    pat_mask = np.random.randint(0, pattern_options_per_shade, size=image.shape)

    dithered = np.zeros((image.shape[0] * pattern_height, image.shape[1] * pattern_width))
    for (iy, ix), c in np.ndenumerate(image.round().astype(np.uint8)):
        pattern = patterns[c, pat_mask[iy, ix]]
        dithered[iy * pattern_height:(iy + 1) * pattern_height, ix * pattern_width:(ix + 1) * pattern_width] = pattern

    return (dithered * 255)

编辑 2

此版本直接将抖动的行作为原始二进制文件写入磁盘。读者应该知道每行有多少像素。根据一些经验测试，这似乎可以解决问题...

import time
from contextlib import contextmanager

import numpy as np


def generate_patterns(
    *,
    pattern_size: int = 8,
    pattern_options_per_shade: int = 16,
    shades: int = 256,
):
    patterns = []
    for shade in range(shades):
        shade_patterns = [
            np.packbits(
                np.random.random((pattern_size, pattern_size))
                < (shade / shades),
                axis=0,
            )[0]
            for i in range(pattern_options_per_shade)
        ]
        patterns.append(shade_patterns)
    return np.array(patterns)


def dither_to_disk(bio, image, patterns):
    assert image.dtype == np.uint8
    shades, pattern_options_per_shade, pattern_height = patterns.shape
    pat_mask = np.random.randint(0, pattern_options_per_shade, size=image.shape)
    for y in range(image.shape[0]):
        patterns[image[y, :], pat_mask[y, :]].tofile(bio)


@contextmanager
def stopwatch(title):
    t0 = time.perf_counter()
    yield
    t1 = time.perf_counter()
    print(title, round((t1 - t0) * 1000, 3))


def main():
    with stopwatch("image generation"):
        img_width = 25_000
        img_height = 5_000
        image = (
            np.linspace(0, 255, img_height)
            .repeat(img_width)
            .reshape((img_height, img_width))
        )
        image[200:280, 200:280] = 0
        image = image.round().astype(np.uint8)

    with stopwatch("pattern generation"):
        patterns = generate_patterns()

    with stopwatch(f"dither_to_disk {image.shape}"):
        with open("x.bin", "wb") as f:
            dither_to_disk(f, image, patterns)


if __name__ == "__main__":
    main()

【讨论】：

非常感谢，这几乎是我想要的。问题是您首先对图片进行下采样，然后使用图案大小对其进行放大。我不想先对其进行下采样。我要使用的图像的像素大小为 25.000 * 25.000，并且在将图案放在 200.000*200.000 上之后。因此，如果我的计算是正确的，那么您的脚本将需要 15 分钟才能达到该大小。
所以结果应该是原始图像的 NxN 尺寸？
（我添加了一个不会对图像进行下采样的dither_embiggen() 版本。）
顺便说一下，一个 200,000 x 200,000 字节的数组将消耗 40 GB 的内存。您确定可以使用它吗？
artwork.com/raster/bigtiff.htm 描述了我想做的事情以及为什么我需要这个尺寸。我的光束大约 6 微米大。现在我们将 tiff 与 255 相乘，以便通过 python 轻松更改 tiff 元数据并告诉 windows 它是一个 1 位文件，最终的 tiff 为 200.000*200.000 位大，因为我们不需要灰度。