【问题标题】:most efficient way to index a numpy array with a number of 1d boolean arrays使用多个一维布尔数组索引 numpy 数组的最有效方法
【发布时间】:2023-01-11 03:45:25
【问题描述】:

假设我有一个 numpy 数组 An 维度,它可能非常大,并假设我有 k 1-dimensional 布尔掩码 M1, ..., Mk

我想从A中提取一个n维数组B,它包含位于索引处的A的所有元素,其中所有掩码的“outer-AND”是True

..但是我想这样做而不首先形成所有面具的(可能非常大)“outer-AND”,并且不必一次从每个轴一个轴提取指定的元素因此创建(可能many) 过程中的中间副本。

下面的示例演示了从上面刚刚描述的 A 中提取元素的两种方法:

from functools import reduce
import numpy as np


m = 100

for _ in range(m):
    n = np.random.randint(0, 10)
    k = np.random.randint(0, n + 1)

    A_shape = tuple(np.random.randint(0, 10, n))

    A = np.random.uniform(-1, 1, A_shape)
    M_lst = [np.random.randint(0, 2, dim).astype(bool) for dim in A_shape]

    # --- USING "OUTER-AND" OF ALL MASKS --- #
    # creating "outer-AND" of all masks:
    M = reduce(np.bitwise_and, (np.expand_dims(M, tuple(np.r_[:i, i+1:n])) for i, M in enumerate(M_lst)), True)
    # creating shape of B:
    B_shape = tuple(map(np.count_nonzero, M_lst)) + A_shape[len(M_lst):]
    # extracting elements from A and reshaping to the correct shape:
    B1 = A[M].reshape(B_shape)
    # checking that the correct number of elements was extracted
    assert B1.size == np.prod(B_shape)
    # THE PROBLEM WITH THIS METHOD IS THE POSSIBLY VERY LARGE OUTER-AND OF ALL THE MASKS!

    # --- USING ONE MASK AT A TIME --- #
    B2 = A
    for i, M in enumerate(M_lst):
        B2 = B2[tuple(slice(None) for _ in range(i)) + (M,)]
    assert B2.size == np.prod(B_shape)
    assert B2.shape == B_shape
    # THE PROBLEM WITH THIS METHOD IS THE POSSIBLY LARGE NUMBER OF POSSIBLY LARGE INTERMEDIATE COPIES!

    assert np.all(B1 == B2)

    # EDIT 1:
    # USING np.ix_ AS SUGGESTED BY Chrysophylaxs
    B3 = A[np.ix_(*M_lst)]
    assert B3.shape == B_shape
    assert B3.size == np.prod(B_shape)

print(f'All three methods worked all {m} times')

有没有更聪明(更有效)的方法来做到这一点,可能使用现有的 numpy 函数?。

编辑 1:我添加了 Chrysophylaxs 建议的解决方案

【问题讨论】:

    标签: python numpy


    【解决方案1】:

    IIUC,你在找np.ix_;一个例子:

    import numpy as np
    
    arr = np.arange(60).reshape(3, 4, 5)
    
    x = [True, False, True]
    y = [False, True, True, False]
    z = [False, True, False, True, False]
    
    out = arr[np.ix_(x, y, z)]
    

    出去:

    array([[[ 6,  8],
            [11, 13]],
    
           [[46, 48],
            [51, 53]]])
    

    【讨论】:

    • 是的,这似乎有效,非常感谢!,我将它添加到循环中的方法列表中......
    猜你喜欢
    • 2013-07-20
    • 2012-04-10
    • 2020-03-25
    • 1970-01-01
    • 2020-10-25
    • 2013-06-19
    • 2021-10-24
    • 2019-06-18
    • 2019-11-20
    相关资源
    最近更新 更多