如何更有效地根据索引将值从一个数组分配给另一个数组？答案

【问题标题】：How can I assign values from one array to another according to the index more efficiently?如何更有效地根据索引将值从一个数组分配给另一个数组？
【发布时间】：2021-07-23 23:29:59
【问题描述】：

我正在尝试根据源数组中有多少个数组来替换一个数组的值。我根据总和从替换数组中的给定索引中分配一个值。因此，如果连续有 2 个，则为物种分配一个值l1[1]，如果有一个单元，则为输出分配一个值l1[0]。

在具体的例子中会更好看：

import numpy as np

l1 = np.array([4, 5])
x112 = np.array([[0, 0], [0, 1], [1, 1], [0, 0], [1, 0], [1, 1]])

array([[0, 0],
       [1, 0],
       [1, 1],
       [0, 0],
       [1, 0],
       [1, 1]])

需要的输出：

[[0]
 [4]
 [5]
 [0]
 [4]
 [5]]

我通过计算每行中的单位并使用np.where进行相应分配来做到这一点：

x1x2 = np.array([0, 1, 2, 0 1, 2]) #count value 1
x1x2 = np.where(x1x2 != 1, x1x2, l1[0]) 
x1x2 = np.where(x1x2 != 2, x1x2, l1[1])             
print(x1x2)

输出

[0 4 5 0 4 5]

这可以更有效地完成吗？

【问题讨论】：

如果行中的个数（单位数）大于 l1 的长度，比如 100 怎么办？
这只是整个代码的一部分，不会出现这种情况，如果l1 = [100] 那么x112也会适应它的大小
你做得很好。也许 Numba-JITed 循环可能更有效，但它对您的矢量化代码来说是一个小的改进。 PS 在这种情况下你可以得到x1x2 = x112[:,0] + x112[:,1]。

标签： python numpy indexing

【解决方案1】：

好的，我实际上尝试了对您的代码进行去矢量化。首先是你拥有的矢量化 NumPy：

def op(x112, l1):
    # bit of cheating, adding instead of counting 1s
    x1x2 = x112[:,0] + x112[:,1]

    x1x2=np.where(x1x2 != 1, x1x2, l1[0])
    x1x2=np.where(x1x2 != 2, x1x2, l1[1])
    return x1x2

最有效的替代方法是只循环一次x112，所以让我们做一个 Numba 循环。

import numba as nb

@nb.njit
def loop(x112, l1):
    d0, d1 = x112.shape
    x1x2 = np.zeros(d0, dtype = x112.dtype)
    for i in range(d0):
        # actually count the 1s
        num1s = 0
        for j in range(d1):
            if x112[i,j] == 1:
                num1s += 1
        
        if num1s == 1:
            x1x2[i] = l1[0]
        elif num1s == 2:
            x1x2[i] = l1[1]
    return x1x2

Numba 循环在我的笔记本电脑上的速度提高了约 9-10 倍。

%timeit op(x112, l1)
8.05 µs ± 34.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit loop(x112, l1)
873 ns ± 5.09 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

按照@Mad_Physicist 的要求，使用更大的阵列进行计时。我也包括了他的高级索引方法。

x112 = np.random.randint(0, 2, size = (100000, 2))
l1_v2 = np.array([0,4,5])

%timeit op(x112, l1)
1.35 ms ± 27.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit loop(x112, l1)
956 µs ± 2.78 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit l1_v2[x112.sum(1)]
1.2 ms ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

编辑：好吧，也许对这些时间持保留态度，因为当我重新启动 IPython 内核并重新运行这些东西时，op(x112, l1) 改进为390 µs ± 22.1 µs per loop，而其他方法保持了相同的性能（971 µs，1.23毫秒）。

【讨论】：

你能给我的时间吗？也在更大的阵列上？在这个例子中，您主要测量数组访问的开销
当然，我会将时间作为评论发布。您有一种方法可以轻松生成更大的数组，我想不出一个。
np.random.randint(2, size=(10000, 3))

【解决方案2】：

您可以使用直接索引：

l1 = np.array([0, 4, 5])
x112 = np.array([[0, 0], [0, 1], [1, 1], [0, 0], [1, 0], [1, 1]])

result = l1[x112.sum(1)]

如果您可以在创建时将零添加到l1，则此方法有效。如果没有：

result = np.r_[0, l1][x112.sum(1)]

【讨论】：

根据要求，%timeit l1[x112.sum(1)] 的结果为 2.43 µs ± 28.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)。老实说，我不确定为什么它没有 Numba 循环那么快，因为这个版本也只循环了一次 x112。也许是 1 个额外的数组分配？
@BatWannaBe。可能是因为在这种情况下 numba 的开销较小。尝试更大的数组（也许添加更多列并制作l1=np.arange(10) 左右。