R：应用的用户定义函数问题答案

【问题标题】：R: user-defined function issue with applyR：应用的用户定义函数问题
【发布时间】：2017-06-19 22:35:44
【问题描述】：

我有 2 个文件。

“增量.tab”

grp   increment
1   10
2   25
3   35
4   50

“input.tab”

我正在尝试将增量应用到“input.tab”的第 2 列，例如：

if grp=1, then increment=0
if grp=2, then increment=10
if grp=3, then increment=10+25=35
if grp=4, then increment=10+25+35=70
...

为了得到这个输出：

grp   pos   pos_adj
1   10   10
1   14   14
1   25   25
2   3   13
2   20   30
3   2   37
3   10   45

我的计划是使用apply逐行处理输入文件：

ref <- read.table("increment.tab", header=T, sep="\t")
input <- read.table("input.tab", header=T, sep="\t")

my_fun <- function(x, y){
   if(x==1){
      inc=0
   }
   else{
      inc=sum(ref[1:match(x, ref$grp)-1,2])
   }
   result = y + inc
   return(result)
}

input$pos_adj = apply(input, 1, my_fun(input$grp, input$pos))

但是我收到了这个我无法理解的错误信息。

Error in match.fun(FUN) : 
  'my_fun(input$grp, input$pos)' is not a function, character or symbol
In addition: Warning message:
In if (x == 1) { :
  the condition has length > 1 and only the first element will be used

为什么 'my_fun' 不被视为函数？

【问题讨论】：

标签： r apply

【解决方案1】：

您对apply 的调用失败，因为您的第三个参数是函数调用的结果，而不是函数本身。此外，虽然它可以根据您的基本数据工作，但如果您的 data.frame 中有任何其他数据类型，它将失败，因为 apply 将 data.frame 转换为 matrix，这将导致类型转换.正是因为这个（以及其他一些原因），我建议不要在此处使用apply。

我认为你可以很容易地对其进行矢量化，引入基于grp 的添加的技巧可以通过merge 解决。（也可以使用dplyr::left_join。）

您的数据：

increment <- read.table(text = "grp   increment
1   10
2   25
3   35
4   50", header = TRUE)

input <- read.table(text = "grp   pos
1   10
1   14
1   25
2   3
2   20
3   2
3   10", header = TRUE)

我将对此进行更新，以便根据$increment 列进行调整。您可以替换 $increment 而不是添加 $add，交给您。

increment$add <- c(0, cumsum(increment$increment[-nrow(increment)]))
increment
#   grp increment add
# 1   1        10   0
# 2   2        25  10
# 3   3        35  35
# 4   4        50  70

x <- merge(input, increment[,c("grp", "add")], by = "grp")
x
#   grp pos add
# 1   1  10   0
# 2   1  14   0
# 3   1  25   0
# 4   2   3  10
# 5   2  20  10
# 6   3   2  35
# 7   3  10  35

从这里开始，这只是一个调整的问题。这两个都是

x$pos_adj <- x$pos + x$add
x$add <- NULL # remove the now-unnecessary column
x
#   grp pos pos_adj
# 1   1  10      10
# 2   1  14      14
# 3   1  25      25
# 4   2   3      13
# 5   2  20      30
# 6   3   2      37
# 7   3  10      45

（我对列之类的内容有点冗长。这当然可以做得更紧凑，但我希望有空间来了解正在做什么以及在哪里。）

【讨论】：

【解决方案2】：

以下是使用来自dplyr 的case_when 的方法。我没有使用您的 increment.tab，因为这些数字与您的示例不符。

dplyr 版本 0.5.0

library(dplyr)
input.tab%>%
  mutate(pos_adj=case_when(.$grp==1 ~ .$pos+0,
                           .$grp==2 ~ .$pos+10,
                           .$grp==3 ~ .$pos+35,
                           .$grp==4 ~ .$pos+70))

  grp pos pos_adj
1   1  10      10
2   1  14      14
3   1  25      25
4   2   3      13
5   2  20      30
6   3   2      37
7   3  10      45

dplyr 版本 0.7.0

library(dplyr)
input.tab%>%
  mutate(pos_adj=case_when(grp==1 ~ pos+0,
                           grp==2 ~ pos+10,
                           grp==3 ~ pos+35,
                           grp==4 ~ pos+70))

数据

input.tab <- read.table(text="grp   pos
1   10
1   14
1   25
2   3
2   20
3   2
3   10",header=TRUE,stringsAsFactors=FALSE)

【讨论】：

【解决方案3】：

首先创建一个向量以从中查找值

vec = setNames(object = c(0, 10, 35, 70), nm = c(1, 2, 3, 4))
vec
# 1  2  3  4 
# 0 10 35 70

然后，从vec 中查找适当的值并添加到pos。使用 P Lapointe 的数据

increment.tab$pos + vec[match(increment.tab$grp, names(vec))]
# 1  1  1  2  2  3  3 
#10 14 25 13 30 37 45

【讨论】：

我想我更喜欢使用 match 作为查找，因为它允许您在 grp 查找失败时设置 nomatch=-Inf（例如）。我的merge 答案将产生NA，需要额外的工作来修复/更改。
谢谢 d.b ！效果很好

【解决方案4】：

您很接近，但正如@r2evans 解释的那样，您的函数调用有问题，并且apply 使用矩阵。他们的解决方案是一个很好的解决方案，但如果您仍想使用您的函数，您只需稍微修改其应用程序并使用来自plyr 库的adply。如上所述使用您的示例 ref 和 input 数据框，并且完全不更改您的函数本身：

new_df <- adply(input, 1, function(df){
  c(pos_adj = my_fun(df$grp, df$pos))
})

> new_df
  grp pos pos_adj
1   1  10      10
2   1  14      14
3   1  25      25
4   2   3      13
5   2  20      30
6   3   2      37
7   3  10      45

如果你想坚持使用apply，你可以走这条路（同样，不改变你的功能）：

input$pos_adj <- apply(input, 1, function(df){
  my_fun(df["grp"], df["pos"])
})

> input
  grp pos pos_adj
1   1  10      10
2   1  14      14
3   1  25      25
4   2   3      13
5   2  20      30
6   3   2      37
7   3  10      45

【讨论】：

感谢 Luke C 的解释和保留我的功能。我现在明白我的错误了。