替代 numpy.argwhere 以加速 python 中的循环答案

【问题标题】：alternative to numpy.argwhere to speed up for loop in python替代 numpy.argwhere 以加速 python 中的循环
【发布时间】：2018-04-14 13:08:15
【问题描述】：

我有两个数据集如下：

ds1：一个 DEM（数字高程模型）文件，作为 2d numpy 数组和，

ds2：显示区域（像素），其中有一些多余的水。

我有一个 while 循环，负责根据其 8 个相邻像素及其自身的高度分布（和更改）每个像素中的多余体积，直到每个像素中的多余体积小于某个值 d = 0.05。因此，在每次迭代中，我需要找到 ds2 中多余体积大于 0.05 的像素索引，如果没有剩余像素，则退出 while 循环：

exit_code == "No"
while exit_code == "No":
    index_of_pixels_with_excess_volume = numpy.argwhere(ds2> 0.05) # find location of pixels where excess volume is greater than 0.05

    if not index_of_pixels_with_excess_volume.size:
        exit_code = "Yes"
    else:
        for pixel in index_of_pixels_with_excess_volume:
            # spread those excess volumes to the neighbours and
            # change the values of ds2

问题是 numpy.argwhere(ds2> 0.05) 非常慢。我正在寻找更快的替代解决方案。

【问题讨论】：

argwhere 只是 where 和 transpose 将数组元组转换为二维数组。
您确定这是argwhere，而不是ds2>0.05 步骤，或者更可能是所有pixel 的迭代？
我做了一个cProfile，并根据argwhere之外的累积秒数得出结论。
where 表达式必须对数组进行多次迭代。一个创建布尔数组。然后np.count_nonzero 快速统计True 值的数量。最后where（实际上是np.nonzero）收集这些值的索引。 transpose 到 argwhere 应该是操作的一小部分。如果 ds2 与 True 像素的数量相比较大，那么这个 where 操作可能会占主导地位。
这是一个非常好的主意。让我担心的是每次迭代中需要花费时间修改两个数组（ds2 和布尔值）而不是一个数组，从而降低性能？作为一个附带问题：使用 scipy 稀疏矩阵而不是布尔值怎么样？

标签： python performance numpy

【解决方案1】：

对于那些可能感兴趣的人来说，我的问题的另一个解决方案：我发现使用 numpy 技巧的“代码向量化”可以通过消除 for 或 while 循环和 numpy.where() 显着加快运行时间。我发现这两个网站对解释代码向量化很有帮助。

https://github.com/rougier/from-python-to-numpy/blob/master/04-code-vectorization.rst#id36

https://www.labri.fr/perso/nrougier/teaching/numpy/numpy.html

【讨论】：

【解决方案2】：

制作一个示例二维数组：

In [584]: arr = np.random.rand(1000,1000)

找出其中的一小部分：

In [587]: np.where(arr>.999)
Out[587]: 
(array([  1,   1,   1, ..., 997, 999, 999], dtype=int32),
 array([273, 471, 584, ..., 745, 310, 679], dtype=int32))
In [588]: _[0].shape
Out[588]: (1034,)

时间各种argwhere：

In [589]: timeit arr>.999
2.65 ms ± 116 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [590]: timeit np.count_nonzero(arr>.999)
2.79 ms ± 26 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [591]: timeit np.nonzero(arr>.999)
6 ms ± 10 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [592]: timeit np.argwhere(arr>.999)
6.06 ms ± 58.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

因此，大约 1/3 的时间用于进行 > 测试，其余时间用于查找 True 元素。将 where 元组转换为 2 列数组很快。

现在，如果目标只是找到第一个 > 值，argmax 很快。

In [593]: np.argmax(arr>.999)
Out[593]: 1273    # can unravel this to (1,273)
In [594]: timeit np.argmax(arr>.999)
2.76 ms ± 143 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

argmax 短路，所以当它找到第一个值时，实际运行时间会有所不同。

flatnonzero 比where 快：

In [595]: np.flatnonzero(arr>.999)
Out[595]: array([  1273,   1471,   1584, ..., 997745, 999310, 999679], dtype=int32)
In [596]: timeit np.flatnonzero(arr>.999)
3.05 ms ± 26.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [599]: np.unravel_index(np.flatnonzero(arr>.999),arr.shape)
Out[599]: 
(array([  1,   1,   1, ..., 997, 999, 999], dtype=int32),
 array([273, 471, 584, ..., 745, 310, 679], dtype=int32))
In [600]: timeit np.unravel_index(np.flatnonzero(arr>.999),arr.shape)
3.05 ms ± 3.58 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [601]: timeit np.transpose(np.unravel_index(np.flatnonzero(arr>.999),arr.shap
     ...: e))
3.1 ms ± 5.86 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

这与np.argwhere(arr>.999)相同。

有趣的是，flatnonzero 方法将时间缩短了一半！没想到会有这么大的进步。

比较迭代速度：

从argwhere 迭代二维数组：

In [607]: pixels = np.argwhere(arr>.999)
In [608]: timeit [pixel for pixel in pixels]
347 µs ± 5.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

使用zip(*) 转置对来自where 的元组进行迭代：

In [609]: idx = np.where(arr>.999)
In [610]: timeit [pixel for pixel in zip(*idx)]
256 µs ± 147 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

在数组上迭代通常比在列表上迭代慢一些，或者在这种情况下是压缩数组。

In [611]: [pixel for pixel in pixels][:5]
Out[611]: 
[array([  1, 273], dtype=int32),
 array([  1, 471], dtype=int32),
 array([  1, 584], dtype=int32),
 array([  1, 826], dtype=int32),
 array([  2, 169], dtype=int32)]
In [612]: [pixel for pixel in zip(*idx)][:5]
Out[612]: [(1, 273), (1, 471), (1, 584), (1, 826), (2, 169)]

一个是数组列表，另一个是元组列表。但是将这些元组（单独）转换为数组很慢：

In [614]: timeit [np.array(pixel) for pixel in zip(*idx)]
2.26 ms ± 4.94 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

在平坦的非零数组上迭代更快

In [617]: fdx = np.flatnonzero(arr>.999)
In [618]: fdx[:5]
Out[618]: array([1273, 1471, 1584, 1826, 2169], dtype=int32)
In [619]: timeit [i for i in fdx]
112 µs ± 23.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

但将unravel 单独应用于这些值需要时间。

def foo(idx):    # a simplified unravel
    return idx//1000, idx%1000

In [628]: timeit [foo(i) for i in fdx]
1.12 ms ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

将此 1 ms 添加到 3 ms 以生成 fdx，此 flatnonzero 可能仍会领先。但在最好的情况下，我们谈论的是 2 倍的速度提升。

【讨论】：

我保证会在 17 小时内投票（达到我当天的上限）（：
对于一维情况（扁平数组），我期待通过预先分配索引数组（例如r=np.arange(ds2.size)），然后简单地将布尔索引应用于该数组，可以实现进一步的加速：@ 987654350@。这在循环中可能会带来一些收益。奇怪的是，这种方法比flatnonzero(ds2 > threshold) 慢。任何想法为什么？
加1代表毅力和努力工作，比我的答案更好！

【解决方案3】：

np.where(arr> 0.05) 和 (arr > 0.05).nonzero() 在我的测试中快了 ~22-25%。

例如：

while exit_code == "No":
    index_of_pixels_with_excess_volume = numpy.where(ds2 > 0.05)

    if not index_of_pixels_with_excess_volume[0].size:
        exit_code = "Yes"
    else:
        for pixel in zip(*index_of_pixels_with_excess_volume):

但是，我担心where 与argwhere 带来的任何收益都会因为zip(*...) 而在最后一个循环中丢失。如果是这种情况，请告诉我，我很乐意删除此答案。

【讨论】：

如何在二维数组中使用 np.flatnonzero() 以及如果我们对大于零以外的数字的值感兴趣怎么办？
@BehzadJamali flatnonzero 不直接用于您的数据，而是用于比较结果 (arr > 0.05)。对于 2D 数据，请使用 nonzero()，这与 where() 几乎相同。
@BehzadJamali 哦，我刚刚重新阅读了您的问题，我看到您提到您正在处理二维数组。正在编辑我的答案...
我正在检查解决方案。
@BehzadJamali 我添加了一个示例，该示例还表明您需要在内部循环中使用zip(*...)，这可能会扼杀任何性能提升。