实现这种二维数值积分的计算速度更快的方法是什么？答案

【问题标题】：What would be the computationally faster way to implement this 2D numerical integration?实现这种二维数值积分的计算速度更快的方法是什么？
【发布时间】：2020-07-16 11:41:15
【问题描述】：

我对进行 2D 数值积分很感兴趣。现在我正在使用scipy.integrate.dblquad，但速度很慢。请看下面的代码。我需要用完全不同的参数评估这个积分 100 次。因此，我想让处理尽可能快速和高效。代码是：

import numpy as np
from scipy import integrate
from scipy.special import erf
from scipy.special import j0
import time

q = np.linspace(0.03, 1.0, 1000)

start = time.time()

def f(q, z, t):
    return t * 0.5 * (erf((t - z) / 3) - 1) * j0(q * t) * (1 / (np.sqrt(2 * np.pi) * 2)) * np.exp(
        -0.5 * ((z - 40) / 2) ** 2)


y = np.empty([len(q)])
for n in range(len(q)):
    y[n] = integrate.dblquad(lambda t, z: f(q[n], z, t), 0, 50, lambda z: 10, lambda z: 60)[0]

end = time.time()
print(end - start)

花费的时间是

212.96751403808594

这太过分了。请提出一种更好的方法来实现我想做的事情。在来这里之前我试图做一些搜索，但没有找到任何解决方案。我读过quadpy 可以更好更快地完成这项工作，但我不知道如何实现。请帮忙。

【问题讨论】：

您的代码目前似乎可以运行，并且您正在寻求改进它。一般来说，这些问题对于本网站来说过于固执己见，但您可能会在CodeReview.SE 找到更好的运气。记得阅读their requirements，因为他们比这个网站更严格。
@DavidBuck 非常感谢您的建议。如果您有这种感觉，我会将其发布在 CodeReview 上。我在这里发布它是因为我希望得到建议以及代码改进。如果其他人有同样的感觉，我会删除它。干杯:)
@David，您是否积极参与 CodeReview 并准备在那里回答这个问题？如果不是不推荐，尤其是对于numpy 问题。
您已经在stackoverflow.com/questions/60905349/… 中询问过quadpy。在 SO 中，通常不赞成寻求其他软件包的建议来解决问题。
您需要承认并建立在上一个问题中获得的帮助。并尝试应用您从之前的 CR 发布中获得的反馈。

标签： python numpy scipy integration numerical-integration

【解决方案1】：

您可以使用 Numba 或低级可调用函数

几乎是你的例子

我只是将函数直接传递给scipy.integrate.dblquad，而不是您使用 lambdas 生成函数的方法。

import numpy as np
from scipy import integrate
from scipy.special import erf
from scipy.special import j0
import time

q = np.linspace(0.03, 1.0, 1000)

start = time.time()

def f(t, z, q):
    return t * 0.5 * (erf((t - z) / 3) - 1) * j0(q * t) * (1 / (np.sqrt(2 * np.pi) * 2)) * np.exp(
        -0.5 * ((z - 40) / 2) ** 2)

def lower_inner(z):
    return 10.

def upper_inner(z):
    return 60.


y = np.empty(len(q))
for n in range(len(q)):
    y[n] = integrate.dblquad(f, 0, 50, lower_inner, upper_inner,args=(q[n],))[0]

end = time.time()
print(end - start)
#143.73969149589539

这已经快了一点点（143 vs. 151s），但唯一的用途是有一个简单的例子来优化。

只需使用 Numba 编译函数

要使其运行，您还需要 Numba 和 numba-scipy。 numba-scipy 的目的是提供来自scipy.special 的包装函数。

import numpy as np
from scipy import integrate
from scipy.special import erf
from scipy.special import j0
import time
import numba as nb

q = np.linspace(0.03, 1.0, 1000)

start = time.time()

#error_model="numpy" -> Don't check for division by zero
@nb.njit(error_model="numpy",fastmath=True)
def f(t, z, q):
    return t * 0.5 * (erf((t - z) / 3) - 1) * j0(q * t) * (1 / (np.sqrt(2 * np.pi) * 2)) * np.exp(
        -0.5 * ((z - 40) / 2) ** 2)

def lower_inner(z):
    return 10.

def upper_inner(z):
    return 60.


y = np.empty(len(q))
for n in range(len(q)):
    y[n] = integrate.dblquad(f, 0, 50, lower_inner, upper_inner,args=(q[n],))[0]

end = time.time()
print(end - start)
#8.636585235595703

使用低级可调用对象

scipy.integrate 函数还提供了传递 C 回调函数而不是 Python 函数的可能性。这些函数可以用 C、Cython 或 Numba 编写，我在这个例子中使用了它们。主要优点是，函数调用不需要 Python 解释器交互。

@Jacques Gaudin 的优秀answer 展示了一种简单的方法来做到这一点，包括额外的参数。

import numpy as np
from scipy import integrate
from scipy.special import erf
from scipy.special import j0
import time
import numba as nb
from numba import cfunc
from numba.types import intc, CPointer, float64
from scipy import LowLevelCallable

q = np.linspace(0.03, 1.0, 1000)

start = time.time()

def jit_integrand_function(integrand_function):
    jitted_function = nb.njit(integrand_function, nopython=True)

    #error_model="numpy" -> Don't check for division by zero
    @cfunc(float64(intc, CPointer(float64)),error_model="numpy",fastmath=True)
    def wrapped(n, xx):
        ar = nb.carray(xx, n)
        return jitted_function(ar[0], ar[1], ar[2])
    return LowLevelCallable(wrapped.ctypes)

@jit_integrand_function
def f(t, z, q):
    return t * 0.5 * (erf((t - z) / 3) - 1) * j0(q * t) * (1 / (np.sqrt(2 * np.pi) * 2)) * np.exp(
        -0.5 * ((z - 40) / 2) ** 2)

def lower_inner(z):
    return 10.

def upper_inner(z):
    return 60.


y = np.empty(len(q))
for n in range(len(q)):
    y[n] = integrate.dblquad(f, 0, 50, lower_inner, upper_inner,args=(q[n],))[0]

end = time.time()
print(end - start)
#3.2645838260650635

【讨论】：

【解决方案2】：

通常，通过矩阵运算进行求和比使用 scipy.integrate.quad（或 dblquad）要快得多。您可以重写您的 f(q, z, t) 以接收 aq、z 和 t 向量并使用 np.tensordot 返回一个 f 值的 3D 数组，然后将您的面积元素 (dtdz) 与函数值相乘并求和他们使用 np.sum。如果您的区域元素不是恒定的，您必须创建一个区域元素数组并使用 np.einsum 考虑您的积分限制，您可以在汇总之前使用屏蔽数组来屏蔽积分限制之外的函数值。请注意，np.einsum 忽略了掩码，因此如果您使用 einsum，您可以使用 np.where 将积分限制之外的函数值设置为零。示例（具有恒定面积元素和简单积分限制）：

import numpy as np
import scipy.special as ss
import time

def f(q, t, z):

    # Making 3D arrays before computation for readability. You can save some time by
    # Using tensordot directly when computing the output
    Mq = np.tensordot(q, np.ones((len(t), len(z))), axes=0)
    Mt = np.tensordot(np.ones(len(q)), np.tensordot(t, np.ones(len(z)), axes = 0), axes = 0)
    Mz = np.tensordot(np.ones((len(q), len(t))), z, axes = 0)

    return Mt * 0.5 * (ss.erf((Mt - Mz) / 3) - 1) * (Mq * Mt) * (1 / (np.sqrt(2 * np.pi) * 2)) * np.exp(
     -0.5 * ((Mz - 40) / 2) ** 2)

q = np.linspace(0.03, 1, 1000)
t = np.linspace(0, 50, 250)
z = np.linspace(10, 60, 250)

#if you have constand dA you can shave some time by computing dA without using np.diff
#if dA is variable, you have to make an array of dA values and np.einsum instead of np.sum
t0 = time.process_time()
dA = np.diff(t)[0] * np.diff(z)[0]
func_vals = f(q, t, z)
I = np.sum(func_vals * dA, axis=(1, 2))
t1 = time.process_time()

这在我的 2012 macbook pro (2.5GHz i5) 上花费了 18.5 秒，dA = 0.04。以这种方式做事还可以让您轻松地在精度和效率之间进行选择，并将 dA 设置为在您知道函数行为方式时有意义的值。

然而，值得注意的是，如果你想要更多的积分，你必须拆分你的积分，否则你可能会最大化你的内存 (1000 x 1000 x 1000) 双打需要 8GB 的内存。因此，如果您正在使用高精度进行非常大的集成，则值得在运行前快速检查所需的内存。

【讨论】：