记录当前函数名称答案

【问题标题】：Logging current function name记录当前函数名称
【发布时间】：2011-09-05 12:20:24
【问题描述】：

我有一些自定义日志函数是cat 的扩展。一个基本的例子是这样的：

catt<-function(..., file = "", sep = " ", fill = FALSE, labels = NULL,
    append = FALSE)
{
    cat(..., format(Sys.time(), "(%Y-%m-%d %H:%M:%S)"), "\n", file = file, 
        sep = sep, fill = fill, labels = labels, append = append)
}

现在，我经常使用（自制）函数，并使用其中一些日志函数来查看进度，效果很好。不过，我注意到的是，我几乎总是像这样使用这些函数：

somefunc<-function(blabla)
{
  catt("somefunc: start")
  #do some very useful stuff here
  catt("somefunc: some time later")
  #even more useful stuff
  catt("somefunc: the end")
}

注意对catt 的每次调用都以调用它的函数的名称开头。在我开始重构代码和重命名函数等之前非常整洁。

感谢 Brian Ripley 的一些旧 R-list 帖子，如果我没记错的话，我找到了这段代码来获取“当前函数名”：

catw<-function(..., file = "", sep = " ", fill = FALSE, labels = NULL,
    append = FALSE)
{
    curcall<-sys.call(sys.parent(n=1))
    prefix<-paste(match.call(call=curcall)[[1]], ":", sep="")
    cat(prefix, ..., format(Sys.time(), "(%Y-%m-%d %H:%M:%S)"), "\n",
        file = file, sep = sep, fill = fill, labels = labels, append = append)
}

这很好，但并不总是有效，因为：

我的函数分散在lapply 中使用的匿名函数函数类型，如下所示：

aFunc<-function(somedataframe)
{
  result<-lapply(seq_along(somedataframe), function(i){
  catw("working on col", i, "/", ncol(somedataframe))
  #do some more stuff here and return something
  return(sum(is.na(somedataframe[[i]])))
  }
}

-> 对于这些情况，显然（并且可以理解）我在 catw 函数中的 sys.parent 调用中需要 n=3。

我偶尔使用do.call：看来我目前的实现也不起作用（我再一次有点理解它，不过我还没有完全弄清楚。

所以，我的问题是：有没有办法在调用堆栈中找到第一个 named 函数（跳过日志记录函数本身，可能还有其他一些“众所周知的”异常），这将允许我要为所有情况编写一个单一版本的catw（这样我就可以愉快地重构而不用担心我的日志代码）？你会怎么处理这样的事情？

编辑：应该支持这些情况：

testa<-function(par1)
{
    catw("Hello from testa, par1=", par1)
    for(i in 1:2) catw("normal loop from testa, item", i)
    rv<-sapply(1:2, function(i){catw("sapply from testa, item", i);return(i)})
    return(rv)
}

testb<-function(par1, par2)
{
    catw("Hello from testb, par1=", par1)
    for(i in 1:2) catw("normal loop from testb, item", i)
    rv<-sapply(1:2, function(i){catw("sapply from testb, item", i);return(i)})

    catw("Will now call testa from testb")
    rv2<-testa(par1)
    catw("Back from testa call in testb")

    catw("Will now do.call testa from testb")
    rv2<-do.call(testa, list(par1))
    catw("Back from testa do.call in testb")

    return(list(rv, rv2))
}

testa(123)
testb(123,456)
do.call(testb, list(123,456))

【问题讨论】：

我经常在我的函数中使用message() 来向控制台输出一个注释，告诉我R 在函数中的哪个点。也许，message() 和 sink(...,type="message") 的某些实现对您有用？缺点是你必须把它放在你所有的函数中。
假设您对函数使用唯一的命名方案，是否可以将 grep 应用于 sys.call 工作？选择第一个匹配项应该是集合中最低的。
@Iterator：函数的命名方案现在不是一个选项。但我愿意接受相反的情况：排除某些方案（如“.*apply.*”）。
@尼克。我错了。我使用了一个命名方案，但是任何对您的集合（包）唯一的名称列表（好的，字符串向量）都应该足够了。获取基本 R 函数列表（或者可能是加载所有 nec. 库时的所有函数）并运行 setdiff() 应该会导致这样的列表，如果您不想手动执行或无权访问命名空间。比我更熟悉 R 的人会知道如何获取此列表。

标签： r function logging

【解决方案1】：

编辑：完全重写函数

此函数的新版本使用调用堆栈sys.calls()，而不是match.call。

调用栈包含完整的调用函数。所以现在的诀窍是只提取你真正想要的部分。我在clean_cs 函数中进行了一些手动清理。这会评估调用堆栈中的第一个单词，并为少数已知的边缘情况返回所需的参数，特别是 lapply、sapply 和 do.call。

这种方法的唯一缺点是它会将函数名一直返回到调用堆栈的顶部。也许合乎逻辑的下一步是将这些函数与指定的环境/命名空间进行比较，并在此基础上包含/排除函数名称......

我会在这里停下来。它回答了问题中的用例。

新功能：

catw <- function(..., callstack=sys.calls()){
  cs <- callstack
  cs <- clean_cs(cs)
  #browser()
  message(paste(cs, ...))
}

clean_cs <- function(x){
  val <- sapply(x, function(xt){
    z <- strsplit(paste(xt, collapse="\t"), "\t")[[1]]
    switch(z[1],
        "lapply" = z[3], 
        "sapply" = z[3],
        "do.call" = z[2], 
        "function" = "FUN",
        "source" = "###",
        "eval.with.vis" = "###",
        z[1]
        )
    })
  val[grepl("\\<function\\>", val)] <- "FUN"
  val <- val[!grepl("(###|FUN)", val)]
  val <- head(val, -1)
  paste(val, collapse="|")
}

测试结果：

testa Hello from testa, par1= 123
testa normal loop from testa, item 1
testa normal loop from testa, item 2
testa sapply from testa, item 1
testa sapply from testa, item 2


testb Hello from testb, par1= 123
testb normal loop from testb, item 1
testb normal loop from testb, item 2
testb sapply from testb, item 1
testb sapply from testb, item 2
testb Will now call testa from testb
testb|testa Hello from testa, par1= 123
testb|testa normal loop from testa, item 1
testb|testa normal loop from testa, item 2
testb|testa sapply from testa, item 1
testb|testa sapply from testa, item 2
testb Back from testa call in testb
testb Will now do.call testa from testb
testb|testa Hello from testa, par1= 123
testb|testa normal loop from testa, item 1
testb|testa normal loop from testa, item 2
testb|testa sapply from testa, item 1
testb|testa sapply from testa, item 2
testb Back from testa do.call in testb


testb Hello from testb, par1= 123
testb normal loop from testb, item 1
testb normal loop from testb, item 2
testb sapply from testb, item 1
testb sapply from testb, item 2
testb Will now call testa from testb
testb|testa Hello from testa, par1= 123
testb|testa normal loop from testa, item 1
testb|testa normal loop from testa, item 2
testb|testa sapply from testa, item 1
testb|testa sapply from testa, item 2
testb Back from testa call in testb
testb Will now do.call testa from testb
testb|testa Hello from testa, par1= 123
testb|testa normal loop from testa, item 1
testb|testa normal loop from testa, item 2
testb|testa sapply from testa, item 1
testb|testa sapply from testa, item 2
testb Back from testa do.call in testb

【讨论】：

如果我的函数包含带有匿名函数的嵌套 sapply 调用怎么办（承认，这有点做作）。您选择的nLevels<-3 不会涵盖这一点，对吧？我正在尝试使用sys.parents() 来避免这种情况，但是当我需要添加数字（或多少）时，我有点难过，例如当我在你的sapply 电话之外打电话时。文档在调用和堆栈帧方面非常简洁。
@NickSabbe，在我编辑的版本中，我使用sys.nframe 来获取调用堆栈深度，而不是指定一个固定的nlevels。我还使用grep 删除apply、lapply、sapply 和家人。
快到了。不过，我还有一个（讨厌的）挑战：如果我立即致电 do.call(my.col, list(df)) 会怎样？这种情况经常发生在我身上，因为我倾向于在调试期间将参数保存到列表中的函数中，因此我可以轻松（重新）调用它们。在这种情况下发生了一些奇怪的事情，因为现在sys.call(sys.parent(n=i))[[1]] 的结果似乎是一个函数（闭包），但不再保留原始函数的名称:-(
@NickSabbe 不错的挑战，但它打败了我。已尽最大努力编辑答案。
@NickSabbe Answer 用全新的方法再次编辑。

【解决方案2】：

我想我会添加到目前为止所取得的进展，完全基于 Andrie 的工作。很确定其他人会喜欢这个，所以它现在是我正在开发的包的一部分（不是在 CRAN 上，而是在 R-Forge 上），在夜间构建之后称为 addendum（包括文档）。

在调用堆栈上查找“当前最低命名函数”的函数，带有一些花里胡哨的功能：

curfnfinder<-function(skipframes=0, skipnames="(FUN)|(.+apply)|(replicate)",
    retIfNone="Not in function", retStack=FALSE, extraPrefPerLevel="\t")
{
    prefix<-sapply(3 + skipframes+1:sys.nframe(), function(i){
            currv<-sys.call(sys.parent(n=i))[[1]]
            return(currv)
        })
    prefix[grep(skipnames, prefix)] <- NULL
    prefix<-gsub("function \\(.*", "do.call", prefix)
    if(length(prefix)==0)
    {
        return(retIfNone)
    }
    else if(retStack)
    {
        return(paste(rev(prefix), collapse = "|"))
    }
    else
    {
        retval<-as.character(unlist(prefix[1]))
        if(length(prefix) > 1)
        {
            retval<-paste(paste(rep(extraPrefPerLevel, length(prefix) - 1), collapse=""), retval, sep="")
        }
        return(retval)
    }
}

这可以用在这样的日志记录函数中：

catw<-function(..., file = "", sep = " ", fill = FALSE, labels = NULL,
    append = FALSE, prefix=0)
{
    if(is.numeric(prefix))
    {
        prefix<-curfnfinder(skipframes=prefix+1) #note: the +1 is there to avoid returning catw itself
        prefix<-paste(prefix, ":", sep="")
    }
    cat(prefix, ..., format(Sys.time(), "(%Y-%m-%d %H:%M:%S)"), "\n",
        file = file, sep = sep, fill = fill, labels = labels, append = append)
}

到目前为止，正如 cmets 对 Andrie 的回答所提到的，关于do.call 仍然存在一些问题。我将暂时停止花时间在上面，但已在r-devel mailinglist 上发布了相关问题。如果/当我在那里得到响应并且它可用时，我将更新功能。

【讨论】：

嘿@Nick，addendum 还活着吗？我在任何地方都找不到它，在其他任何地方也找不到这个功能（search.r-project.org/cgi-bin/…）。如果addendum 已存档，您是否可以同意我将此功能添加到userfriendlyscience，当然要归功于您作为作者？