【问题标题】:dplyr:: How can one mutate using variables references in function?dplyr:: 如何在函数中使用变量引用进行变异?
【发布时间】:2019-06-21 13:42:38
【问题描述】:

有人能告诉我如何使用 dplyr 将带有参数名称的向量传递给函数吗?

library("dplyr", quietly = TRUE, warn.conflicts = FALSE) # version 0.8.0.1

# Does not work
iris %>% rowwise() %>%  mutate(v1 = mean( as.name(names(iris)[-5]) ) )
iris %>% rowwise() %>%  mutate(v1 = mean( !!(names(iris)[-5]) ) )
iris %>% rowwise() %>%  mutate(v1 = mean( enquo(names(iris)[-5]) ) )
iris %>% rowwise() %>%  
mutate(v1 = mean( c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")  ) )

# This works and is the intended result
iris %>% rowwise() %>%  
mutate(v1 = mean( c(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width )  ) )

关键是让函数(平均值或任何函数)与names(iris)[-5] 或具有变量名称的向量一起使用

我看过这里没有成功: dplyr mutate_each_ standard evaluation ; dplyr: Standard evaluation and enquo()

我的会话信息:

R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252   
[3] LC_MONETARY=French_France.1252 LC_NUMERIC=C                  
[5] LC_TIME=French_France.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_3.1.0   visdat_0.5.3    lubridate_1.7.4 naniar_0.4.2   
[5] dplyr_0.8.0.1  

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1       rstudioapi_0.10  magrittr_1.5     tidyselect_0.2.5
 [5] munsell_0.5.0    colorspace_1.4-0 R6_2.4.0         rlang_0.3.4     
 [9] fansi_0.4.0      stringr_1.4.0    plyr_1.8.4       tools_3.5.3     
[13] grid_3.5.3       packrat_0.5.0    gtable_0.2.0     utf8_1.1.4      
[17] cli_1.1.0        withr_2.1.2      digest_0.6.18    lazyeval_0.2.2  
[21] assertthat_0.2.0 tibble_2.1.1     crayon_1.3.4     tidyr_0.8.3     
[25] purrr_0.3.2      glue_1.3.1       labeling_0.3     stringi_1.4.3   
[29] compiler_3.5.3   pillar_1.3.1     scales_1.0.0     pkgconfig_2.0.2 

提前致谢!

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    使用map2_dbl

    library(tidyverse)
    iris %>% mutate(v1 = map2_dbl(Sepal.Length, Sepal.Width, ~mean(c(.x, .y)))) %>% head
    
    #  Sepal.Length Sepal.Width Petal.Length Petal.Width Species   v1
    #1          5.1         3.5          1.4         0.2  setosa 4.30
    #2          4.9         3.0          1.4         0.2  setosa 3.95
    #3          4.7         3.2          1.3         0.2  setosa 3.95
    #4          4.6         3.1          1.5         0.2  setosa 3.85
    #5          5.0         3.6          1.4         0.2  setosa 4.30
    #6          5.4         3.9          1.7         0.4  setosa 4.65
    

    或者,如果您想获取某些列的mean

    cols <- c("Sepal.Length", "Sepal.Width")
    
    iris %>% mutate(v1 = rowMeans(.[cols])) %>% head
    

    【讨论】:

    • 而且map2_dblrowwise 更可取,因为后者不再是"actively developed"
    • 感谢您的回答,真的很接近。如何让它与 names(iris)[1:4] 一起使用? iris %&gt;% mutate(v1 = purrr::map_dbl(.x = as.list(names(iris)[1:4]), .f~mean(.x))) %&gt;% head 给出一个错误。
    • @cbo 我用rowMeans 更新了一个选项,它可以使用名称,或者您也可以将rlang::sym 用于两个值。 iris %&gt;% mutate(v1 = map2_dbl(!!sym(cols[1]), !!sym(cols[2]), ~mean(c(.x, .y))))
    【解决方案2】:

    我们可以在base R中使用rowMeans

    cols <-  c("Sepal.Length", "Sepal.Width")
    iris$v1 <- rowMeans(iris[cols])
    

    或在tidyverse

    library(tidyverse)
    iris %>%
        mutate(v1 = select(., cols)  %>% reduce(`+`)/length(cols)) %>%
        head
    #  Sepal.Length Sepal.Width Petal.Length Petal.Width Species   v1
    #1          5.1         3.5          1.4         0.2  setosa 4.30
    #2          4.9         3.0          1.4         0.2  setosa 3.95
    #3          4.7         3.2          1.3         0.2  setosa 3.95
    #4          4.6         3.1          1.5         0.2  setosa 3.85
    #5          5.0         3.6          1.4         0.2  setosa 4.30
    #6          5.4         3.9          1.7         0.4  setosa 4.65
    

    或者另一个选项是pmap(当有超过两列时也应该工作)

    iris %>%
          mutate(v1 = pmap_dbl(.[cols], ~ mean(c(...))))
    

    【讨论】:

    • pmap 绝对是最简洁的答案,谢谢! (减少也很好,但不确定它是否适用于自定义统计信息)。
    • @cbo 是的,它可以使用自定义函数。确保检查传递的参数
    【解决方案3】:

    感谢@Ronak Shah 和@akrun 的回答。我的问题可能从一开始就没有很好地表述,它是 pmap 寻找的:

    cols <- names(iris)[-5]
    
    library(dplyr, quietly = TRUE, warn.conflicts = FALSE)
    
    iris %>% mutate(v1 = rowMeans(.[cols])) %>% head # ok with mean per rows
    #>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species    v1
    #> 1          5.1         3.5          1.4         0.2  setosa 2.550
    #> 2          4.9         3.0          1.4         0.2  setosa 2.375
    #> 3          4.7         3.2          1.3         0.2  setosa 2.350
    #> 4          4.6         3.1          1.5         0.2  setosa 2.350
    #> 5          5.0         3.6          1.4         0.2  setosa 2.550
    #> 6          5.4         3.9          1.7         0.4  setosa 2.850
    
    # Creating a custom stat function
    set.seed(123)
    w0 <- rnorm(n = 10)
    mystat <- function(x, w = w0[1:length(x)]) sum(x*w)/length(x)
    
    iris[1, cols] %>% mystat # test value
    #> [1] -0.3669384
    
    # Tests
    iris %>% mutate(v1 = mystat(.[cols])) %>% head # ko
    #>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species       v1
    #> 1          5.1         3.5          1.4         0.2  setosa 109.1179
    #> 2          4.9         3.0          1.4         0.2  setosa 109.1179
    #> 3          4.7         3.2          1.3         0.2  setosa 109.1179
    #> 4          4.6         3.1          1.5         0.2  setosa 109.1179
    #> 5          5.0         3.6          1.4         0.2  setosa 109.1179
    #> 6          5.4         3.9          1.7         0.4  setosa 109.1179
    
    library(purrr, quietly = TRUE, warn.conflicts = FALSE)
    iris %>% mutate(v1 = map_dbl(list(.[cols]), mystat)) %>% head # ko
    #>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species       v1
    #> 1          5.1         3.5          1.4         0.2  setosa 109.1179
    #> 2          4.9         3.0          1.4         0.2  setosa 109.1179
    #> 3          4.7         3.2          1.3         0.2  setosa 109.1179
    #> 4          4.6         3.1          1.5         0.2  setosa 109.1179
    #> 5          5.0         3.6          1.4         0.2  setosa 109.1179
    #> 6          5.4         3.9          1.7         0.4  setosa 109.1179
    
    iris %>% mutate(v1 = pmap_dbl(.[cols], ~ mystat(c(...)))) %>% head # OK mean
    #>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species         v1
    #> 1          5.1         3.5          1.4         0.2  setosa -0.3669384
    #> 2          4.9         3.0          1.4         0.2  setosa -0.3101425
    #> 3          4.7         3.2          1.3         0.2  setosa -0.3325953
    #> 4          4.6         3.1          1.5         0.2  setosa -0.2348935
    #> 5          5.0         3.6          1.4         0.2  setosa -0.3586810
    #> 6          5.4         3.9          1.7         0.4  setosa -0.3115633
    

    【讨论】:

      猜你喜欢
      • 2018-08-24
      • 2017-11-27
      • 2015-03-13
      • 1970-01-01
      • 2021-02-05
      • 2014-04-15
      • 1970-01-01
      • 1970-01-01
      • 2014-08-25
      相关资源
      最近更新 更多