【问题标题】：Why is numpy ma.average 24 times slower than arr.mean?为什么 numpy ma.average 比 arr.mean 慢 24 倍？
【发布时间】：2018-01-16 07:34:19
【问题描述】：

我在 Python 的 numpy. ma.average 比 arr.mean 慢很多（arr 是一个数组）

>>> arr = np.full((3, 3), -9999, dtype=float)
array([[-9999., -9999., -9999.],
       [-9999., -9999., -9999.],
       [-9999., -9999., -9999.]])

%timeit np.ma.average(arr, axis=0)
The slowest run took 49.32 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 191 µs per loop

%timeit arr.mean(axis=0)
The slowest run took 6.63 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.41 µs per loop

随机数

arr = np.random.random((3,3))

%timeit arr.mean(axis=0)
The slowest run took 6.17 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.78 µs per loop

%timeit np.ma.average(arr, axis=0)
1000 loops, best of 3: 186 µs per loop

--> 慢了将近 24 倍。

文档

numpy.ma.average(a, axis=None, weights=None, returned=False)

返回给定轴上数组的weighted 平均值。

numpy.mean(a, axis=None, dtype=None, out=None, keepdims)

计算沿指定轴的算术平均值。

为什么ma.average 比arr.mean 慢这么多？在数学上它们是相同的（如果我错了，请纠正我）。我的猜测是它与ma.average 上的加权选项有关，但如果没有通过权重，不应该有后备吗？

【问题讨论】：

Masked operations（你看到.ma.？）很慢！
进一步对少量数据进行测试不是一个好习惯：7.78us 和 186us 之间有什么区别？不多。您需要使用更大的矩阵。
;) 谢谢，没想到。无论如何，给定数组上没有掩码。在旧代码中找到它，该代码曾经在那里有一个掩码数组。
（可能）不重要。不同的计算模型。比较np.mean 和np.average（两个非掩码函数）并使用更大的数据！
@mumbala：我知道。我只是说，将来，您最好发布大批量的测试结果。如果这里的ma.average 会重定向到np.mean（它不会），那么即使重定向也会产生巨大的影响。

标签： python performance numpy average mean

【解决方案1】：

找出某项速度较慢的原因的一个好方法是对其进行分析。我将在这里使用第 3 方库 line_profiler 和 IPython 命令 %lprun（例如参见 this blog）：

%load_ext line_profiler

import numpy as np
arr = np.full((3, 3), -9999, dtype=float)

%lprun -f np.ma.average np.ma.average(arr, axis=0)

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   519                                           def average(a, axis=None, weights=None, returned=False):
   ...
   570         1         1810   1810.0     30.5      a = asarray(a)
   571         1           15     15.0      0.3      m = getmask(a)
   572                                           
   573                                               # inspired by 'average' in numpy/lib/function_base.py
   574                                           
   575         1            5      5.0      0.1      if weights is None:
   576         1         3500   3500.0     59.0          avg = a.mean(axis)
   577         1          591    591.0     10.0          scl = avg.dtype.type(a.count(axis))
   578                                               else: 
   ...
   608                                           
   609         1            7      7.0      0.1      if returned:
   610                                                   if scl.shape != avg.shape:
   611                                                       scl = np.broadcast_to(scl, avg.shape).copy()
   612                                                   return avg, scl
   613                                               else:
   614         1            5      5.0      0.1          return avg

我删除了一些不相关的行。

所以实际上 30% 的时间都花在了 np.ma.asarray 上（arr.mean 不必这样做！）。

但是，如果您使用更大的数组，则相对时间会发生巨大变化：

arr = np.full((1000, 1000), -9999, dtype=float)

%lprun -f np.ma.average np.ma.average(arr, axis=0)
Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   519                                           def average(a, axis=None, weights=None, returned=False):
   ...
   570         1          609    609.0      7.6      a = asarray(a)
   571         1           14     14.0      0.2      m = getmask(a)
   572                                           
   573                                               # inspired by 'average' in numpy/lib/function_base.py
   574                                           
   575         1            7      7.0      0.1      if weights is None:
   576         1         6924   6924.0     86.9          avg = a.mean(axis)
   577         1          404    404.0      5.1          scl = avg.dtype.type(a.count(axis))
   578                                               else:
   ...
   609         1            6      6.0      0.1      if returned:
   610                                                   if scl.shape != avg.shape:
   611                                                       scl = np.broadcast_to(scl, avg.shape).copy()
   612                                                   return avg, scl
   613                                               else:
   614         1            6      6.0      0.1          return avg

这次np.ma.MaskedArray.mean 函数几乎占用了 90% 的时间。

注意：您还可以深入挖掘并查看 np.ma.asarray 或 np.ma.MaskedArray.count 或 np.ma.MaskedArray.mean 并检查他们的线路配置文件。但我只是想表明有很多被调用的函数会增加开销。

那么下一个问题是：np.ndarray.mean 和 np.ma.average 之间的相对时间是否也发生了变化？至少在我的电脑上，差异现在要小得多：

%timeit np.ma.average(arr, axis=0)
# 2.96 ms ± 91 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit arr.mean(axis=0)
# 1.84 ms ± 23.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

这一次甚至没有慢 2 倍。我假设对于更大的数组，差异会变得更小。

这在 NumPy 中也很常见：

即使对于普通的 numpy 函数，常数因子也相当高（例如，请参阅我对问题 "Performance in different vectorization method in numpy" 的回答）。对于np.ma，这些常数因子甚至更大，尤其是如果您不使用np.ma.MaskedArray 作为输入。但即使常数因子可能很高，这些函数在大数组中表现出色。

【讨论】：

非常感谢。没有比这更好的答案了！还要感谢带有示例的 line_profiler

【解决方案2】：

感谢上面 cmets 中的 @WillemVanOnsem 和 @sascha

编辑：适用于小型数组，有关更多信息，请参阅接受的答案

屏蔽操作是慢速尝试，避免它：

mask = self.local_pos_history[:, 0] > -9
local_pos_hist_masked = self.local_pos_history[mask]
avg = local_pos_hist_masked.mean(axis=0)

戴着面具的老人

mask = np.ma.masked_where(self.local_pos_history > -9, self.local_pos_history)
local_pos_hist_mask = self.local_pos_history[mask].reshape(len(self.local_pos_history) // 3, 3)
avg_pos = self.local_pos_history

np.average 几乎等于 arr.mean:

%timeit np.average(arr, axis=0)
The slowest run took 5.81 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.89 µs per loop

%timeit np.mean(arr, axis=0)
The slowest run took 6.44 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8.74 µs per loop

只是为了澄清仍然是小批量的测试

【讨论】：