【发布时间】:2021-11-30 01:17:13
【问题描述】:
我有一个具有以下结构的数据集:
df <- data.frame(id = 1:5,
study = c("st1","st2","st3","st4","st5"),
a_var = c(10,20,30,40,50),
b_var = c(6,5,4,3,2),
c_var = c(3,4,5,6,7),
d_var = c(80,70,60,50,40))
我想计算名称中包含 _var 的每一列与其名称中包含 _var 的所有其他列的平均值之间的差异,如下所示:
mean_deviated_value <- function(data, variable) {
md_value = data[,variable] - rowMeans(data[,names(data) != variable])
md_value
}
df$a_var_md <- mean_deviated_value(dplyr::select(df, contains("_var")), "a_var")
df$b_var_md <- mean_deviated_value(dplyr::select(df, contains("_var")), "b_var")
df$c_var_md <- mean_deviated_value(dplyr::select(df, contains("_var")), "c_var")
df$d_var_md <- mean_deviated_value(dplyr::select(df, contains("_var")), "d_var")
这给了我想要的输出:
id study a_var b_var c_var d_var a_var_md b_var_md c_var_md d_var_md
1 1 st1 10 6 3 80 -19.666667 -12.33333 -9.80 83.80000
2 2 st2 20 5 4 70 -6.333333 -16.91667 -10.35 70.76667
3 3 st3 30 4 5 60 7.000000 -21.50000 -10.90 57.73333
4 4 st4 40 3 6 50 20.333333 -26.08333 -11.45 44.70000
5 5 st5 50 2 7 40 33.666667 -30.66667 -12.00 31.66667
如何在不重复代码的情况下一次性完成,最好使用dplyr/purrr?
我试过了:
df %>%
mutate(across(contains("_var"), ~ list(md = .x - rowMeans(select(., contains("_var") & !.x)))))
得到了这个错误:
Error: Problem with `mutate()` input `..1`.
ℹ `..1 = across(...)`.
x no applicable method for 'select' applied to an object of class "c('double', 'numeric')"
【问题讨论】:
标签: purrr