带有内置积分或容积的非常慢的双积分，R 中 prac2d 的错误结果答案

【问题标题】：Very slow double integrals with built-in integration or cubature, wrong result with prac2d in R带有内置积分或容积的非常慢的双积分，R 中 prac2d 的错误结果
【发布时间】：2018-08-12 08:53:39
【问题描述】：

我有一个关于R中双积分计算的问题。也许它不是尝试数值积分的最佳软件包，但我们严重依赖它的随机优化包（要优化的函数非常非琐碎，有很多局部最小值），所以我们不能切换到 MATLAB 或其他包。

问题如下：使用嵌套的integrate 函数计算二重积分需要很长时间，而使用cubature 包中的hcubature 方法则需要更多时间（！）。我尝试了this 答案中的第一个解决方案（使用cubature 包中的hcubature），但它使时间变得更糟；除此之外，不支持无限积分限制，并且已经对 (-100, 100) 区间进行了积分扼流。使用第二种解决方案（pracma 包中的quad2d），时机很好，但计算结果很差！

单积分的计算速度非常快（例如，如果将双积分注释掉，计算函数的值只需要0.2秒，可以容忍）。

这里是 MWE 函数的一个高度简化的版本（只是为了说明集成点）。

library(cubature)
library(pracma)

# Generate some artificial data to try this function on
set.seed(100)
n <- 200
r <- rnorm(n, 0.0004, 0.01)

# Log-likelihood function accepts 3 parameters:
# [1] shape of positive shocks, [2] shape of negative shocks, [3] DoF of Student's distribution for jumps
parm <- c(6, 7, 10)

LL <- function(parm, cub = "default") {

  shapes <- parm[1:2]
  studdof <- parm[3]

  # For simplification, generate some dynamic series
  set.seed(101)
  sigmaeps <- rgamma(n, shape=shapes[1], rate=1000)
  sigmaeta <- rgamma(n, shape=shapes[2], rate=1000)
  lambdas <- rgamma(n, shape=10, rate=80)+1
  probs <- sapply(lambdas, function(x) dpois(0:2, lambda=x))
  probs <- sweep(probs, 2, colSums(probs), FUN="/") # Normalising the probabilities

  # Reserving memory for 3 series of density
  fw0 <- rep(NA, n)  
  fw1 <- rep(NA, n)
  fw2 <- rep(NA, n)

  for (t in 2:n) {
    integ0 <- function(e) { # First integrand for 0 jumps
      1/sigmaeta[t] * dgamma(-(r[t]-sigmaeps[t]*e)/sigmaeta[t], shape=shapes[2]) * # Density of negative shocks
        dgamma(e, shape=shapes[1]) # Density of positive shocks
    }

    integ1 <- function(e, g) { # Double integrand for 1 jump
      1/sigmaeta[t] * dgamma(-(r[t]-sigmaeps[t]*e-1*g)/sigmaeta[t], shape=shapes[2]) * # Density of negative shocks
        dgamma(e, shape=shapes[1]) * # Density of positive shocks
        dt(g, df = studdof)/1 # Density of jump intensity
    }

    integ2 <- function(e, g) { # Double integrand for 2 jumps
      1/sigmaeta[t] * dgamma(-(r[t]-sigmaeps[t]*e-2*g)/sigmaeta[t], shape=shapes[2]) * # Density of negative shocks
        dgamma(e, shape=shapes[1]) * # Density of positive shocks
        dt(g, df = studdof)/2 # Density of jump intensity
    }

    # Wrappers for cubature because they need vector inputs
    wrapper1 <- function(x) integ1(x[1], x[2])
    wrapper2 <- function(x) integ2(x[1], x[2])

    # Single integral that is not a problem
    fw0[t] <- integrate(integ0, 0, Inf)$value

    if (cub=="cubature") {
      # 2D CUBATURE FROM cubature PACKAGE
      fw1[t] <- hcubature(wrapper1, c(0, -20), c(20, 20))$integral
      fw2[t] <- hcubature(wrapper2, c(0, -20), c(20, 20))$integral
    } else if (cub=="prac2d") {
      # 2D CUBATURE FROM pracma PACKAGE
      fw1[t] <- quad2d(integ1, 0, 100, -100, 100)
      fw2[t] <- quad2d(integ2, 0, 100, -100, 100)
    } else if (cub=="default") {
      # DOUBLE INTEGRALS FROM BUILT-IN INTEGRATE
      fw1[t] <- integrate(function(g) { sapply(g, function(g) { integrate(function(e) integ1(e, g), 0, Inf)$value }) }, -Inf, Inf)$value
      fw2[t] <- integrate(function(g) { sapply(g, function(g) { integrate(function(e) integ2(e, g), 0, Inf)$value }) }, -Inf, Inf)$value 
    }

    if (!t%%10) print(t)
  }
  fw <- fw0*probs[1, ] + fw1*probs[2, ] + fw2*probs[3, ]
  fw <- log(fw[2:n])
  fw[is.nan(fw)] <- -Inf
  slfw <- sum(fw)
  print(paste0("Point: ", paste(formatC(parm, 4, format="e", digits=3), collapse=" "), ", LL: ", round(slfw, 2)))

  return(slfw)
}

system.time(LL(parm, cub="default"))
# 13 seconds
# "Point: 6.000e+00 7.000e+00 1.000e+01, LL: 247.78"
system.time(LL(parm, cub="cubature"))
# 29 seconds, the result is slightly off
# "Point: 6.000e+00 7.000e+00 1.000e+01, LL: 241.7"
system.time(LL(parm, cub="prac2d"))
# 0.5 seconds, the result is way off
# "Point: 6.000e+00 7.000e+00 1.000e+01, LL: 223.25"

（理想情况下，integ1(e, g) 和 integ2(e, g) 应集成到 [0, Inf) w.r.t. e 和 (-Inf, Inf) w.r.t. g。）

并行化在更高级别完成（即随机优化器并行计算此似然函数的值），因此此函数必须在单个内核上尽快运行。

有什么办法可以加快这个双积分的计算速度？

【问题讨论】：

请使用pracma::integral2 而不是慢速的quad2d。这是 2-dim 的一个非常现代的实现。纯 R 中的集成。对于您的示例，它将返回 240.16，大约需要 1.5 秒。我不知道这个结果有多准确。如果这对你来说不够好，恐怕你需要一些外部程序。
@HansW。只是一个简短的问题：你为什么说quad2d 很慢，而它在这里表现最好？另外，“外部程序”是什么意思？
您将需要增加节点的数量，n = 256 或其他东西，以使结果更准确，这将大大减慢quad2d。例如，对于“外部”，我的意思是您可以从 R 或专用 C 库中调用 Python 或 Julia 集成例程。让我有点担心的是你希望最终整合到一个无限的领域。

标签： r performance integral numerical-integration

【解决方案1】：

这是一个 hcubature 的包装器，我用它来允许无限的限制：

hcubature.inf <- function() {
  cl <- match.call()
  cl[[1L]] <- quote(cubature::hcubature)
  if(all(is.finite(c(lowerLimit,upperLimit)))) return(eval.parent(cl))

  # convert limits to new coordinates to incorporate infinities                                                      
  cl[['upperLimit']] <- atan(upperLimit)
  cl[['lowerLimit']] <- atan(lowerLimit)
  # wrap the function with the coordinate transformation                                                             
  # update argument to hcubature with our function  
  f <- match.fun(f)                                                                 
  cl[['f']] <- if(!vectorInterface)
                 function(x, ...) f(tan(x), ...) / prod(cos(x))^2
               else
                 function(x, ...) f(tan(x), ...) / rep(apply(cos(x), 2, prod)^2, each=fDim)
  eval.parent(cl)
}
formals(hcubature.inf) <- formals(cubature::hcubature)

那么你应该向量化被积函数：

vwrapper1 <- function(x) as.matrix(integ1(x[1,], x[2,]))
vwrapper2 <- function(x) as.matrix(integ2(x[1,], x[2,]))

并整合：

if (cub=="cubature.inf") {
  fw1[t] <- hcubature.inf(vwrapper1, c(0, -Inf), c(Inf, Inf), vectorInterface=TRUE)$integral
  fw2[t] <- hcubature.inf(vwrapper2, c(0, -Inf), c(Inf, Inf), vectorInterface=TRUE)$integral
} else if (cub=="cubature") {
 ...

您在默认方法的大约一半时间内获得 242.83 的值。

【讨论】：