R中的匹配函数：match.fun vs deparse(substitute()) vs “直接”提供函数答案

【问题标题】：Matching function in R: match.fun vs deparse(substitute()) vs supplying function "directly"R中的匹配函数：match.fun vs deparse(substitute()) vs “直接”提供函数
【发布时间】：2022-01-04 19:21:53
【问题描述】：

fun1、fun2 和 fun3 似乎按预期工作：

fun1 <- function(fun, x) {
  fun(x)
}

fun1(mean, 1:10)
fun1(as.character, 1:10)
fun1(notafun, 1:10)

fun2 <- function(fun, x) {
  fun <- match.fun(fun)
  fun(x)
}

fun2(mean, 1:10)
fun2(as.character, 1:10)
fun2(notafun, 1:10)

fun3 <- function(fun, x) {
  fun <- deparse(substitute(fun))
  do.call(fun, list(x))
}

fun3(mean, 1:10)
fun3(as.character, 1:10)
fun3(notafun, 1:10)

一般来说，一种策略是首选的吗？到目前为止，我只注意到match.fun 在将fun 指定为字符串时也有效。

我的用例是本地使用的包中的非导出函数（如果我不能将fun 指定为字符串，这不是限制）。使用match.fun 而不是“直接”提供函数（如fun1）有什么好处。

【问题讨论】：

谢谢。我编辑了我的问题以提供更多信息。

标签： r function

【解决方案1】：

首先，文档！以下是?match.fun的相关部分：

在以函数为参数的函数内部调用时，提取所需的函数对象，同时避免与其他类型的对象进行不希望的匹配。

如果FUN 是一个函数，则返回它。如果是符号（例如，用反引号括起来）或长度为 1 的字符向量，则会在调用者的父级环境中使用 get 查找。

因此，match.fun 有两个主要好处：

它为用户提供了传递字符串和符号而不是函数的选项。
它提供类型安全，因为返回值始终是一个函数。这使您的源代码不仅更加健壮，而且更加透明：无需阅读您的 fun2 的文档即可知道其参数 fun 必须指定一个函数。

它提供了这些好处，而且几乎没有性能成本：

x1 <- mean
x2 <- "mean"
x3 <- quote(mean)
microbenchmark::microbenchmark(match.fun(x1), match.fun(x2), match.fun(x3), times = 1000L)
# Unit: nanoseconds
#           expr  min   lq     mean median   uq  max neval
#  match.fun(x1)  287  328  362.481    328  328 1681  1000
#  match.fun(x2) 1599 1681 1820.892   1681 1763 7544  1000
#  match.fun(x3) 1599 1640 1783.049   1681 1722 7339  1000

由于这些原因，在尝试评估函数调用（如在您的 fun2 中）之前使用 match.fun 进行验证几乎总是比等待并希望可以评估调用（如在您的 @987654330 @ 和 fun3)。即使您的函数没有被导出，即使您从不传递字符串或符号，这个原则仍然成立，因为透明度（参见 2）使您的源代码更易于阅读和维护。

您的fun3 是独一无二的，因为它允许用户传递未计算的表达式，但由于多种原因，这种方法存在问题：

在其他函数中无法正常工作；请参阅@Hong Ooi 的评论/答案。

您不能传递使用双冒号或三冒号运算符访问的函数，或者匿名函数，或者更一般地说，任何间接计算函数的表达式：

fun3(base::mean, 1:10)
# Error in `base::mean`(1:10) : could not find function "base::mean"
fun3(function(x) mean(x), 1:10)
# Error in `function(x) mean(x)`(1:10) : 
#   could not find function "function(x) mean(x)"
fun3(match.fun(mean), 1:10)
# Error in `match.fun(mean)`(1:10) : 
#   could not find function "match.fun(mean)"

即使它确实按你期望的那样工作，它也大多是雾里看花：如果deparse(substitute(fun)) 的结果是一个字符串，它命名了一个可从调用环境访问的函数，那么就没有必要了首先是deparse(substitute(fun))，因为fun 无论如何都会评估该函数。它没有做额外的工作：
```
microbenchmark::microbenchmark(fun1(mean, 1:10), fun3(mean, 1:10), times = 1000L)
# Unit: microseconds
#              expr   min     lq      mean median     uq    max neval
#  fun1(mean, 1:10) 2.009  2.378  2.700055  2.460  2.788 14.350  1000
#  fun3(mean, 1:10) 9.020 10.127 10.991813 10.701 11.480 52.398  1000
```

总之，当您期望函数作为参数时，最好使用match.fun。如果你想接受函数但不字符串或符号，你可能会避免match.fun，但在这种情况下，进行测试仍然是一个好习惯：

function(FUN, ...) {
  if (!is.function(FUN)) {
    stop("oops")
  }
  ## do stuff
}

【讨论】：

fun3 如果在封闭函数中调用，将失败，例如g <- function(f, x) fun3(f, x); g(mean, 1:10)
一般来说，除非绝对必要，否则尽量避免非标准的评估技巧
谢谢 - 我参考了您的回答并提到了另一个问题：fun3(base::mean, x) 失败。
非常感谢您的详细回答。很清楚！

【解决方案2】：

一个关键的区别是fun3 如果在封闭函数中调用将失败，例如：

g <- function(f, x)
{
    fun3(f, x)
}

g(mean, 1:10)
# Error in f(1:10) : could not find function "f"

一般来说，除非绝对必要，否则尽量避免使用非标准的评估技巧。

【讨论】：