dplyr 使用字符串选择列并应用基本函数答案

【问题标题】：dplyr select column using string and apply base functiondplyr 使用字符串选择列并应用基本函数
【发布时间】：2020-07-07 17:49:10
【问题描述】：

假设我需要做的一个数学运算被指定为一个字符向量

math.operation <- 'mean' # this could be mean, sum or length

我想将这个math.operation 应用到一个名称也作为字符串在 dplyr 中提供的列上

my.column <- 'col1'
 
dat <- data.frame(id = rep(1:4, each = 4),
                  col1 = 1:16,
                  col2 = 16:1)

我首先选择了基于my.column的列，然后添加回我的分组变量id，然后尝试按组进行操作

dat %>% dplyr::select(contains(my.column)) %>% 
dplyr::mutate(id = dat$id) %>%
dplyr::group_by(id) %>% 
dplyr::summarise(match.fun(math.operation)(my.column))

我被困在产生 NAs 的最后一行

【问题讨论】：

标签： r function dplyr

【解决方案1】：

选项 1 您可以将do.call 与!! sym() 一起使用。请注意，我删除了您的第一个 select 和 mutate 调用，因为它们对于本示例来说似乎是多余的。

选项 2 您可以使用call 而不是do.call，在这里您不需要将参数包装在list() 中，但是您需要使用eval，因此语句并不会更短。

选项 3 第三种选择是将您的方法与您的示例中缺少的match.fun 和!! sym() 一起使用。不过，我认为do.call 更直接。

选项 4 最后你可以使用eval(parse(...))，但是使用do.call和!! sym()的第一种方式更可取。

library(dplyr)

math.operation <- 'mean' # this could be mean, sum or length

my.column <- 'col1'

dat <- data.frame(id = rep(1:4, each = 4),
                  col1 = 1:16,
                  col2 = 16:1)
# Option 1
dat %>% 
  dplyr::group_by(id) %>% 
  dplyr::summarise(newvar = do.call(math.operation, list(!! sym(my.column))))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 4 x 2
#>      id newvar
#>   <int>  <dbl>
#> 1     1    2.5
#> 2     2    6.5
#> 3     3   10.5
#> 4     4   14.5

# Option 2
dat %>% 
  dplyr::group_by(id) %>%
  dplyr::summarise(newvar = eval(call(math.operation, !! sym(my.column))))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 4 x 2
#>      id newvar
#>   <int>  <dbl>
#> 1     1    2.5
#> 2     2    6.5
#> 3     3   10.5
#> 4     4   14.5

# Option 3
dat %>% 
  dplyr::group_by(id) %>%
  dplyr::summarise(newvar = match.fun(math.operation)(!! sym(my.column)))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 4 x 2
#>      id newvar
#>   <int>  <dbl>
#> 1     1    2.5
#> 2     2    6.5
#> 3     3   10.5
#> 4     4   14.5

# Option 4
dat %>% 
  dplyr::group_by(id) %>% 
  dplyr::summarise(newvar = eval(parse(text = paste0(math.operation, "(", my.column , ")"))))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 4 x 2
#>      id newvar
#>   <int>  <dbl>
#> 1     1    2.5
#> 2     2    6.5
#> 3     3   10.5
#> 4     4   14.5

^{由reprex package (v0.3.0) 于 2020 年 7 月 8 日创建}

【讨论】：