将 python 函数广播到 numpy 数组答案

【问题标题】：Broadcasting a python function on to numpy arrays将 python 函数广播到 numpy 数组
【发布时间】：2011-07-09 00:47:03
【问题描述】：

假设我们有一个特别简单的函数，比如

import scipy as sp
def func(x, y):
   return x + y

这个函数显然适用于 x 和 y 的几种内置 python 数据类型，如 string、list、int、float、array 等。由于我们对数组特别感兴趣，我们考虑两个数组：

x = sp.array([-2, -1, 0, 1, 2])
y = sp.array([-2, -1, 0, 1, 2])

xx = x[:, sp.newaxis]
yy = y[sp.newaxis, :]

>>> func(xx, yy)

array([[-4, -3, -2, -1,  0],
  [-3, -2, -1,  0,  1],
  [-2, -1,  0,  1,  2],
  [-1,  0,  1,  2,  3],
  [ 0,  1,  2,  3,  4]])

正如我们所期望的那样。

现在，如果想将数组作为以下函数的输入，该怎么办？

def func2(x, y):
  if x > y:
     return x + y
  else:
     return x - y

执行>>>func(xx, yy) 会引发错误。

第一个显而易见的方法是 scipy/numpy 中的 sp.vectorize 函数。然而，这种方法已被证明不是很有效。谁能想到一种更强大的方式将任何函数广播到 numpy 数组？

如果以数组友好的方式重写代码是唯一的方法，那么如果您也可以在此处提及它会有所帮助。

【问题讨论】：

标签： python numpy scipy

【解决方案1】：

np.vectorize 是将对数字进行操作的 Python 函数转换为对 ndarray 进行操作的 numpy 函数的通用方法。

但是，正如您所指出的，它不是很快，因为它“在后台”使用 Python 循环。

为了获得更好的速度，您必须手工制作一个函数，该函数需要 numpy 数组作为输入并利用该 numpy 特性：

import numpy as np

def func2(x, y):
    return np.where(x>y,x+y,x-y)      

x = np.array([-2, -1, 0, 1, 2])
y = np.array([-2, -1, 0, 1, 2])

xx = x[:, np.newaxis]
yy = y[np.newaxis, :]

print(func2(xx, yy))
# [[ 0 -1 -2 -3 -4]
#  [-3  0 -1 -2 -3]
#  [-2 -1  0 -1 -2]
#  [-1  0  1  0 -1]
#  [ 0  1  2  3  0]]

关于性能：

test.py：

import numpy as np

def func2a(x, y):
    return np.where(x>y,x+y,x-y)      

def func2b(x, y):
    ind=x>y
    z=np.empty(ind.shape,dtype=x.dtype)
    z[ind]=(x+y)[ind]
    z[~ind]=(x-y)[~ind]
    return z

def func2c(x, y):
    # x, y= x[:, None], y[None, :]
    A, L= x+ y, x<= y
    A[L]= (x- y)[L]
    return A

N=40
x = np.random.random(N)
y = np.random.random(N)

xx = x[:, np.newaxis]
yy = y[np.newaxis, :]

跑步：

N=30：

% python -mtimeit -s'import test' 'test.func2a(test.xx,test.yy)'
1000 loops, best of 3: 219 usec per loop

% python -mtimeit -s'import test' 'test.func2b(test.xx,test.yy)'
1000 loops, best of 3: 488 usec per loop

% python -mtimeit -s'import test' 'test.func2c(test.xx,test.yy)'
1000 loops, best of 3: 248 usec per loop

N=1000：

% python -mtimeit -s'import test' 'test.func2a(test.xx,test.yy)'
10 loops, best of 3: 93.7 msec per loop

% python -mtimeit -s'import test' 'test.func2b(test.xx,test.yy)'
10 loops, best of 3: 367 msec per loop

% python -mtimeit -s'import test' 'test.func2c(test.xx,test.yy)'
10 loops, best of 3: 186 msec per loop

这似乎表明func2a 比func2c 稍快（而func2b 非常慢）。

【讨论】：

np.where 对于这样的事情也非常有用。
@matt：非常感谢您的建议！
@unutbu：使用where 看起来确实不错，但是您是否也考虑过使用where 实现时对性能的影响？谢谢
@unutbu：有趣的时间安排。关心 N 的时间，比如在 1e3...1e4 的范围内？到目前为止，where 的实现似乎是最合理的一种。谢谢
@eat：当我不假思索地设置N=10000时，我让我可怜的小电脑挂了:)。当x 的形状为 (10000,) 时，func2* 的返回值的形状为 (10000,10000)。使用 dtype=float64`，至少需要 760 MiB。这使我进入了缓冲区交换的领域。无论如何，我倾向于相信即使 N 增长，结果的顺序也不会改变。你觉得会吗？

【解决方案2】：

对于这种特殊情况，您还可以编写一个对 NumPy 数组和普通 Python 浮点数都进行操作的函数：

def func2d(x, y):
    z = 2.0 * (x > y) - 1.0
    z *= y
    return x + z

这个版本的速度也是unutbu's func2a() 的四倍以上（用N = 100 测试）。

【讨论】：

+1：干得好！看起来func2d 更快，因为它需要更少的内存分配。你同意吗？
@unutbu：不确定。我写的第一个版本使用的临时变量更少（例如在第一行省略- 1.0，使用z -= 1.0），但这更慢。
嗯，这很奇怪。对我（N=1000）来说，使用z -= 1.0 每次循环需要 40.7 毫秒，而func2d 需要 47.8 毫秒。
+1，不错！在我的机器中，N=10、100、1000 的性能比在 func2a/func2d 之间为 [1.144, 1.885, 1.624]。现在最好从 OP 获得反馈。谢谢

【解决方案3】：

只是为了了解基本概念，您可以修改您的功能，例如这种方式：

def func2(x, y):
    x, y= x[:, None], y[None, :]
    A= x+ y
    A[x<= y]= (x- y)[x<= y]
    return A

因此，对于您的情况，这样的事情应该是一个非常合理的起点：

In []: def func(x, y):
   ..:     x, y= x[:, None], y[None, :]
   ..:     return x+ y
   ..:
In []: def func2(x, y):
   ..:     x, y= x[:, None], y[None, :]
   ..:     A, L= x+ y, x<= y
   ..:     A[L]= (x- y)[L]
   ..:     return A
   ..:
In []: x, y= arange(-2, 3), arange(-2, 3)
In []: func(x, y)
Out[]:
array([[-4, -3, -2, -1,  0],
       [-3, -2, -1,  0,  1],
       [-2, -1,  0,  1,  2],
       [-1,  0,  1,  2,  3],
       [ 0,  1,  2,  3,  4]])
In []: func2(x, y)
Out[]:
array([[ 0, -1, -2, -3, -4],
       [-3,  0, -1, -2, -3],
       [-2, -1,  0, -1, -2],
       [-1,  0,  1,  0, -1],
       [ 0,  1,  2,  3,  0]])

虽然这种处理方式看起来会浪费资源，但事实并非如此。始终衡量您的程序的实际性能，然后（而不是更早）进行必要的更改。

恕我直言，还有一个额外的优势：这种“矢量化”最终使您的代码真正一致且可读。

【讨论】：