一次在 Numpy 数组中执行多重比较（间隔）答案

【问题标题】：Perform mulitple comparisons (interval) in a Numpy array in a single pass一次在 Numpy 数组中执行多重比较（间隔）
【发布时间】：2017-12-11 10:34:19
【问题描述】：

我的情况类似于“Count values in a certain range”问题，但我有一个矩阵intervals，而不是列向量，它有两列[upper, lower] 和另一个列向量true_values。

我想检查true_values 向量中的值是否在定义的[upper, lower] 范围内，元素明智。

链接问题中提供的答案将通过 4 次： ((true_values >= intervals[:, 0]) & (true_values <= intervals[:, 1])).sum()

每个大于/小于检查一次通过，and 子句一次通过，sum 一次通过。

鉴于这些可能是巨大的矩阵，我想知道是否有可能减少必要的通过次数，理想情况下是一次通过间隔检查，一次通过总和（我认为这是不可避免的），我在想一些事情就像在 intervals' 行上广播一个函数。

这是一个最小的例子：

import numpy as np
from sklearn.ensemble import GradientBoostingRegressor
n_samples = 2000
n_features = 10
rng = np.random.RandomState(0)
X = rng.normal(size=(n_samples, n_features))
w = rng.normal(size=n_features)
# simple linear function without noise
y = np.dot(X, w)

gbrt = GradientBoostingRegressor(loss='quantile', alpha=0.95)

gbrt.fit(X, y)
# Get upper interval
upper_interval = gbrt.predict(X)
# Get lower interval
gbrt.set_params(alpha=0.05)
gbrt.fit(X, y)
lower_interval = gbrt.predict(X)
intervals = np.concatenate((lower_interval[:, np.newaxis], upper_interval[:, np.newaxis]), axis=1)
# This is 4 passes:
perc_correct_intervals = ((y >= intervals[:, 0]) & (y <= intervals[:, 1])).sum() / y.shape[0]

【问题讨论】：

请考虑添加代码示例，或修改您在此问题中发布的示例。就目前而言，它的格式和范围使我们很难为您提供帮助；这是一个great resource，可以帮助您开始。祝你的代码好运！
你可以使用循环并@njit 吗？
@ReblochonMasque 添加了代码示例

标签： python performance numpy vectorization array-broadcasting

【解决方案1】：

np.count_nonzero 与 .sum() 相比可以节省一些费用，如果您真的不需要将 intervals 矩阵用于其他用途，则可以节省更多

%%timeit
intervals = np.concatenate((lower_interval[:, np.newaxis], upper_interval[:, np.newaxis]), axis=1);
perc_correct_intervals = ((y >= intervals[:, 0]) & (y <= intervals[:, 1])).sum() / y.shape[0]

15.7 µs ± 78.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


%%timeit
np.count_nonzero(np.less(lower_interval, y)*np.less(y, upper_interval))/y.size

3.93 µs ± 28 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

【讨论】：

必须计算区间矩阵。尽管如此，使用count_nonzero 还是有一点优势（12-14%）。据我所知，通过矩阵的次数保持不变。