理解 R 中 optim() 的 maxit 参数答案

【问题标题】：Understanding the maxit argument of optim() in R理解 R 中 optim() 的 maxit 参数
【发布时间】：2019-05-07 12:15:35
【问题描述】：

在对optim() 的以下调用中，我希望对fn() 进行一次评估，对gr() 进行一次评估，因为maxit=1。但是，fn() 和 gr() 分别被评估 7 次。

optim(par=1000, fn=function(x) x^2, gr=function(x) 2*x,
      method="L-BFGS-B", control=list(maxit=1))$counts
function gradient 
       7        7

这是为什么呢？这是一个错误吗？或者为什么optim() 一次迭代做 7 次评估？

更详细的输出：

optim(par=1000,
      fn=function(x) { cat("f(", x, ")", sep="", fill=TRUE); x^2 },
      gr=function(x) { cat("g(", x, ")", sep="", fill=TRUE); 2*x },
      method="L-BFGS-B", control=list(maxit=1))$counts
f(1000)
g(1000)
f(999)
g(999)
f(995)
g(995)
f(979)
g(979)
f(915)
g(915)
f(659)
g(659)
f(1.136868e-13)
g(1.136868e-13)
function gradient 
       7        7

（使用 R 版本 3.5.0 测试。）

【问题讨论】：

可以在 R 邮件列表中找到类似的注释。没有回复。 r.789695.n4.nabble.com/…
用method 的其他值测试得到不同的计数。

标签： r optimization mathematical-optimization

【解决方案1】：

迭代是优化算法的一次迭代。 函数评估是对目标函数的一次调用。每次迭代需要多少次函数评估取决于：

正在使用什么算法（例如 Nelder-Mead 与 BFGS 与 ...）
一个迭代步骤的工作原理
- 例如对于Nelder-Mead an iteration comprises (1) 反射； (2) [也许] 扩展； (3) [也许] 收缩； (4) [也许] 收缩；总是有一个评估（反射），但其他步骤取决于第一个子步骤中发生的情况
- 对于L-BFGS-B，我认为涉及行搜索...
是否需要通过有限差分计算导数

不管怎样，nlminb 允许单独控制最大迭代和最大评估：

‘eval.max’目标函数的最大求值次数允许。默认为 200。
‘iter.max’ 允许的最大迭代次数。默认为 150。

【讨论】：

Byrd 等人的技术报告中给出了L-BFGS-B 的一次迭代中涉及的步骤列表。 1995 年，第 17 页：users.iems.northwestern.edu/~nocedal/PDFfiles/limited.pdf

【解决方案2】：

文档：

请参阅https://stat.ethz.ch/R-manual/R-devel/library/stats/html/optim.html 了解更多信息：

convergence 
An integer code. 0 indicates successful completion (which is always the case for "SANN" and "Brent"). Possible error codes are

1      indicates that the iteration limit maxit had been reached.

运行您的代码（但查看 convergence 而不是 counts），我得到：

> optim(par=1000,
+       fn=function(x) { cat("f(", x, ")", sep="", fill=TRUE); x^2 },
+       gr=function(x) { cat("g(", x, ")", sep="", fill=TRUE); 2*x },
+       method="L-BFGS-B", control=list(maxit=1))$convergence
f(1000)
g(1000)
f(999)
g(999)
f(995)
g(995)
f(979)
g(979)
f(915)
g(915)
f(659)
g(659)
f(1.136868e-13)
g(1.136868e-13)
[1] 1

所以它运行了一次迭代并停止，返回convergence = 1。另一个关键是在counts 描述中，它说：

counts  
A two-element integer vector giving the number of calls to fn and gr respectively. 
This excludes those calls needed to compute the Hessian, if requested, and any calls 
to fn to compute a finite-difference approximation to the gradient.

暗示它调用了很多次来弄清楚发生了什么。您可以查看c 代码，了解每个方法将调用您的函数的次数。

【讨论】：

【解决方案3】：

在这里你可以找到一个很好的解释。

https://stat.ethz.ch/pipermail/r-devel/2010-August/058081.html

关键点是函数在迭代期间被多次评估。您可以看到将 maxit 增加到 2 会导致另一个函数评估。

【讨论】：