您可以使用NumPy broadcasting 和boolean indexing -
DATA[ARRAY <= np.arange(DATA.shape[1])] = np.nan
说明
让我们以S = 5 和n=4 为例,创建DATA 和ARRAY。
In [288]: S = 5
...: n = 4
...: DATA = uniform(low=0, high=1, size=S*n).reshape((S, n))
...: ARRAY = poissonN(1, S).reshape((-1, 1))
...:
In [289]: DATA
Out[289]:
array([[ 0.54235747, 0.01309313, 0.62664698, 0.92081697],
[ 0.17877576, 0.36536259, 0.91874957, 0.81924979],
[ 0.7518459 , 0.73218436, 0.99685998, 0.26435871],
[ 0.73130257, 0.77123956, 0.10437601, 0.09296549],
[ 0.804398 , 0.78675381, 0.71066382, 0.87481544]])
In [290]: ARRAY
Out[290]:
array([[1],
[1],
[0],
[2],
[1]])
现在,运行循环代码,看看会发生什么 -
In [291]: for i, draw in enumerate(ARRAY):
...: DATA[i, draw:] = np.nan
...:
In [292]: DATA
Out[292]:
array([[ 0.54235747, nan, nan, nan],
[ 0.17877576, nan, nan, nan],
[ nan, nan, nan, nan],
[ 0.73130257, 0.77123956, nan, nan],
[ 0.804398 , nan, nan, nan]])
现在,通过建议的解决方案,我们正在创建一个与DATA 形状相同的布尔数组,以便它以True 覆盖所有NaN 元素,其余为False。
同样,我们使用broadcasting,如下所示 -
In [293]: ARRAY <= np.arange(DATA.shape[1])
Out[293]:
array([[False, True, True, True],
[False, True, True, True],
[ True, True, True, True],
[False, False, True, True],
[False, True, True, True]], dtype=bool)
因此,通过布尔索引,我们可以将所有这些位置设置为 DATA 中的 NaN。让我们创建另一个随机元素实例并使用我们提出的方法测试 NaN -
In [294]: DATA = uniform(low=0, high=1, size=S*n).reshape((S, n))
In [295]: DATA[ARRAY <= np.arange(DATA.shape[1])] = np.nan
In [296]: DATA
Out[296]:
array([[ 0.87061908, nan, nan, nan],
[ 0.69237094, nan, nan, nan],
[ nan, nan, nan, nan],
[ 0.04257803, 0.82311917, nan, nan],
[ 0.00723291, nan, nan, nan]])
请注意非 Nan 值是不同的,因为我们重新创建了 DATA。需要注意的重要一点是我们正确设置了NaNs。
运行时测试
In [297]: # Inputs
...: S = 1000
...: n = 8
...: DATA = uniform(low=0, high=1, size=S*n).reshape((S, n))
...: ARRAY = poissonN(1, S).reshape((-1, 1))
...:
In [298]: DATAc = DATA.copy() # Make copy for testing proposed ans
In [299]: def org_app(DATA,ARRAY):
...: for i, draw in enumerate(ARRAY):
...: DATA[i, draw:] = np.nan
...:
In [301]: %timeit org_app(DATA,ARRAY)
100 loops, best of 3: 4.99 ms per loop
In [302]: %timeit DATAc[ARRAY <= np.arange(DATAc.shape[1])] = np.nan
10000 loops, best of 3: 94.1 µs per loop
In [305]: np.allclose(np.isnan(DATA),np.isnan(DATAc))
Out[305]: True