【发布时间】:2018-05-22 16:35:11
【问题描述】:
我正在努力编写一个在dplyr::mutate() 中工作的函数。
由于rowwise() %>% sum() 在大型数据集上相当慢,建议的替代方法是返回到 baseR。我希望简化此过程,如下所示,但在 mutate 函数中传递数据时遇到问题。
require(tidyverse)
#> Loading required package: tidyverse
#I'd like to write a function that works inside mutate and replaces the rowSums(select()).
cars <- as_tibble(cars)
cars %>%
mutate(sum = rowSums(select(., speed, dist), na.rm = T))
#> # A tibble: 50 x 3
#> speed dist sum
#> <dbl> <dbl> <dbl>
#> 1 4. 2. 6.
#> 2 4. 10. 14.
#> 3 7. 4. 11.
#> 4 7. 22. 29.
#> 5 8. 16. 24.
#> 6 9. 10. 19.
#> 7 10. 18. 28.
#> 8 10. 26. 36.
#> 9 10. 34. 44.
#> 10 11. 17. 28.
#> # ... with 40 more rows
#Here is my first attempt.
rowwise_sum <- function(data, ..., na.rm = FALSE) {
columns <- rlang::enquos(...)
data %>%
select(!!!columns) %>%
rowSums(na.rm = na.rm)
}
#Doesnt' work as expected:
cars %>% mutate(sum = rowwise_sum(speed, dist, na.rm = T))
#> Error in mutate_impl(.data, dots): Evaluation error: no applicable method for 'select_' applied to an object of class "c('double', 'numeric')".
#But alone it is creating a vector.
cars %>% rowwise_sum(speed, dist, na.rm = T)
#> [1] 6 14 11 29 24 19 28 36 44 28 39 26 32 36 40 39 47
#> [18] 47 59 40 50 74 94 35 41 69 48 56 49 57 67 60 74 94
#> [35] 102 55 65 87 52 68 72 76 84 88 77 94 116 117 144 110
#Appears to not be getting the data passed. Specifying with a dot works.
cars %>% mutate(sum = rowwise_sum(., speed, dist, na.rm = T))
#> # A tibble: 50 x 3
#> speed dist sum
#> <dbl> <dbl> <dbl>
#> 1 4. 2. 6.
#> 2 4. 10. 14.
#> 3 7. 4. 11.
#> 4 7. 22. 29.
#> 5 8. 16. 24.
#> 6 9. 10. 19.
#> 7 10. 18. 28.
#> 8 10. 26. 36.
#> 9 10. 34. 44.
#> 10 11. 17. 28.
#> # ... with 40 more rows
所以问题变成了如何通过在函数内部传递数据来解决每次都包含一个点的需求?
rowwise_sum2 <- function(data, ..., na.rm = FALSE) {
columns <- rlang::enquos(...)
data %>%
select(!!!columns) %>%
rowSums(., na.rm = na.rm)
}
#Same error
cars %>% mutate(sum = rowwise_sum2(speed, dist, na.rm = T))
#> Error in mutate_impl(.data, dots): Evaluation error: no applicable method for 'select_' applied to an object of class "c('double', 'numeric')".
#Same result
cars %>% rowwise_sum2(speed, dist, na.rm = T)
#> [1] 6 14 11 29 24 19 28 36 44 28 39 26 32 36 40 39 47
#> [18] 47 59 40 50 74 94 35 41 69 48 56 49 57 67 60 74 94
#> [35] 102 55 65 87 52 68 72 76 84 88 77 94 116 117 144 110
#Same result
cars %>% mutate(sum = rowwise_sum2(., speed, dist, na.rm = T))
#> # A tibble: 50 x 3
#> speed dist sum
#> <dbl> <dbl> <dbl>
#> 1 4. 2. 6.
#> 2 4. 10. 14.
#> 3 7. 4. 11.
#> 4 7. 22. 29.
#> 5 8. 16. 24.
#> 6 9. 10. 19.
#> 7 10. 18. 28.
#> 8 10. 26. 36.
#> 9 10. 34. 44.
#> 10 11. 17. 28.
#> # ... with 40 more rows
由reprex package (v0.2.0) 于 2018 年 5 月 22 日创建。
下面 akrun 的回答(请点赞):
套用一句话:放弃 mutate() 并在新函数中执行所有操作。
这是我作为对他的更新的最后一个函数,如果需要,它还允许命名总和值列。
rowwise_sum <- function(data, ..., sum_col = "sum", na.rm = FALSE) {
columns <- rlang::enquos(...)
data %>%
select(!!! columns) %>%
transmute(!!sum_col := rowSums(., na.rm = na.rm)) %>%
bind_cols(data, .)
}
【问题讨论】: