将某些内容从迭代 numpy 数组更改为矢量化答案

【问题标题】：Changing something from iterating over a numpy array to vectorization将某些内容从迭代 numpy 数组更改为矢量化
【发布时间】：2013-07-18 15:27:41
【问题描述】：

我正在尝试通过矢量化来加速下面的代码：

[rows,cols] = flow_direction_np.shape
elevation_gain = np.zeros((rows,cols), np.float)

for [i, j], flow in np.ndenumerate(flow_direction_np):
    try:
        if flow == 32:
            elevation_gain[i - 1, j - 1]  = elevation_gain[i - 1, j - 1] + sediment_transport_np[i, j]
        elif flow == 64:
            elevation_gain[i - 1, j]  = elevation_gain[i - 1, j] + sediment_transport_np[i, j]
        elif flow == 128:
            elevation_gain[i - 1, j + 1]  = elevation_gain[i - 1, j + 1] + sediment_transport_np[i, j]
        elif flow == 16:
            elevation_gain[i, j - 1]  = elevation_gain[i, j - 1] + sediment_transport_np[i, j]
        elif flow == 1:
            elevation_gain[i, j + 1]  = elevation_gain[i, j + 1] + sediment_transport_np[i, j]
        elif flow == 2:
            elevation_gain[i + 1, j + 1]  = elevation_gain[i + 1, j + 1] + sediment_transport_np[i, j]
        elif flow == 4:
            elevation_gain[i + 1, j]  = elevation_gain[i + 1, j] + sediment_transport_np[i, j]
        elif flow == 8:
            elevation_gain[i + 1, j - 1]  = elevation_gain[i + 1, j - 1] + sediment_transport_np[i, j]
    except IndexError:
            elevation_gain[i, j] = 0

这是我的代码目前的样子：

elevation_gain = np.zeros_like(sediment_transport_np)
nrows, ncols = flow_direction_np.shape
lookup = {32: (-1, -1),
            16:  (0, -1), 
            8:   (+1, -1),
            4:   (+1,  0),
            64: (-1,  0),
            128:(-1,  +1),
            1:   (0,  +1),
            2:   (+1,  +1)}

# Initialize an array for the "shifted" mask
shifted = np.zeros((nrows+2, ncols+2), dtype=bool)

# Pad elevation gain with zeros
tmp = np.zeros((nrows+2, ncols+2), elevation_gain.dtype)
tmp[1:-1, 1:-1] = elevation_gain
elevation_gain = tmp

for value, (row, col) in lookup.iteritems():
    mask = flow_direction_np == value

    # Reset the "shifted" mask
    shifted.fill(False)
    shifted[1:-1, 1:-1] = mask

    # Shift the mask by the right amount for the given value
    shifted = np.roll(shifted, row, 0)
    shifted = np.roll(shifted, col, 1)

    # Set the values in elevation change to the offset value in sed_trans
    elevation_gain[shifted] = elevation_gain[shifted] + sediment_transport_np[mask]

我遇到的麻烦是他们最后没有给我相同的结果有什么建议我哪里出错了吗？

【问题讨论】：

标签： python loops numpy iterator vectorization

【解决方案1】：

您可以使用np.where 来获取您的情况发生的索引，从而显着提高您的性能：

ind = np.where( flow_direction_np==32 )

您会看到ind 是一个包含两个元素的元组，第一个是flow_direction_np 数组的第一个轴的索引，第二个是第二个轴的索引。

您可以使用这些索引来应用移位：i-1、j-1 等等...

ind_32 = (ind[0]-1, ind[1]-1)

然后你使用花哨的索引来更新数组：

elevation_gain[ ind_32 ] += sediment_transport_np[ ind ]

编辑：将这个概念应用到您的案例中会得到这样的结果：

lookup = {32: (-1, -1),
          16: ( 0, -1),
           8: (+1, -1),
           4: (+1,  0),
          64: (-1,  0),
         128: (-1, +1),
           1: ( 0, +1),
           2: (+1, +1)}

for num, shift in lookup.iteritems():
    ind = np.where( flow_direction_np==num )
    ind_num = ind[0] + shift[0], ind[1] + shift[1]
    elevation_gain[ ind_num] += sediment_transport_np[ ind ]

【讨论】：

我想知道你能否澄清一下：i -1, j-1 与 ind[0]-1, ind[1]-1 相同
@SaulloCastro - 使用where 毫无意义。它只会减慢速度。该代码已经使用了一个布尔掩码，它等效但更快。
嗨乔，再次感谢您上次的帮助，我试图实现的最终产品已更改为：if flow == 32:elevation_change[i, j] = deposit_transport_np[i - 1, j - 1] 到 f 流 == 32：elevation_gain[i - 1, j - 1] =elevation_gain[i - 1, j - 1] + deposit_transport_np[i, j] 我已经操纵了你给我的代码来反映这一点改变，但我得到了不同的结果，有什么建议吗？
@JoeKington 非常感谢您的评论。我不知道where 比mask 慢...
@NickJones 我已经为您的案例更新了答案...解释一下您的问题，ind 是带有两个数组的元组。当您执行ind[0] 时，您将获得第一个轴的索引数组，然后ind[0]-1 将从该数组中减去一个，相当于i-1。

【解决方案2】：

您得到不同结果的原因是由于 python 处理负索引的方式。

对于其他阅读的人，这个问题（和答案）是从这里跟进的：Iterating through a numpy array and then indexing a value in another array

首先，我很抱歉“矢量化”代码如此密集。 my earlier answer里有透彻的解释，这里不再赘述。

您的原始代码（在原始问题中）实际上与您在此处发布的版本略有不同。

基本上，在你之前

for [i, j], flow in np.ndenumerate(flow_direction_np):
    try:
        if flow == 32:
            ...
        elif ...
            ...

当i+1 或j+1 大于网格大小时，您会遇到索引错误。

只是在做：

for [i, j], flow in np.ndenumerate(flow_direction_np):
    try:
        if flow == 32:
            ...
        elif ...
            ...
    except IndexError:
        elevation_change[i, j] = 0

实际上是不正确的，因为它在网格的不同侧定义了不同的边界条件。

在第二种情况下，当j-1 或i-1 为负数时，将返回网格相反侧的值。但是，当j+1 或i+1 大于网格大小时，将返回0。（因此是“不同的边界条件”。）

在代码的矢量化版本中，0 在索引为负数和超出网格边界时均返回。

作为一个简单的示例，请注意以下情况：

In [1]: x = [1, 2, 3]

In [2]: x[0]
Out[2]: 1

In [3]: x[1]
Out[3]: 2

In [4]: x[2]
Out[4]: 3

In [5]: x[3]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-5-ed224ad0520d> in <module>()
----> 1 x[3]

IndexError: list index out of range

In [6]: x[-1]
Out[6]: 3

In [7]: x[-2]
Out[7]: 2

In [8]: x[-3]
Out[8]: 1

In [9]: x[-4]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-9-f9c639f21256> in <module>()
----> 1 x[-4]

IndexError: list index out of range

In [10]:

请注意，不超过序列大小的负索引是有效的，并返回序列的“相反端”。所以，x[3] 会引发错误，而x[-1] 只会返回另一端。

希望这更清楚一点。

【讨论】：

嗨乔，再次感谢我现在了解负索引的工作原理，但我仍然不确定我是否正确更改了代码？当我运行一个模拟来比较负索引没有影响的地方时，答案是不同的？
@NickJones - 抱歉耽搁了！当事情变得不那么忙时（可能今晚晚些时候），我会回复你。
好的，谢谢乔！我实际上不明白为什么我得到不同的答案！