【发布时间】:2020-12-07 18:50:54
【问题描述】:
我正在尝试生成由基于 mblm 包的自定义函数生成的回归斜率表(此处示例中的函数是简化版本)。该函数需要一个公式作为参数,我想使用 dplyr summarise 将其应用于来自具有许多变量的大型数据帧的分组样本。 输出应该是我可以传递给热图函数的样本组和响应变量的回归斜率。
library (dplyr)
# Example data
test_data <-
rbind (
data.frame(ID=paste0("someName", c(1:9)), Sample_Type="type1",
A=seq(1,17, length.out=9),
I=0.1^seq(1,1.8,length.out=9),
J=1-0.1^seq(1,1.8,length.out=9)),
data.frame(ID=paste0("someName", c(10:15)), Sample_Type="type2",
A=seq(1,7, length.out=6),
I=0.1^(1-seq(1,1.5,length.out=6)),
J=1-0.1^(1-seq(1,1.5,length.out=6))))
# Define an independent and the responding variables - I would like to be able to easily test different independent variables
idpVar <- "A"
respVar <- test_data %>% .[!names(.) %in% c("ID", "Sample_Type", idpVar)] %>% names()
# Custom function generating numeric value of median slopes (simplified from mblm)
medianSlope <-
function (formula, dataframe)
{
if (missing(dataframe))
dataframe <- environment(formula)
term <- as.character(attr(terms(formula), "variables")[-1])
x = dataframe[[term[2]]]
y = dataframe[[term[1]]]
if (length(term) > 2) {
stop("Only linear models are accepted")
}
xx = sort(x)
yy = y[order(x)]
n = length(xx)
slopes = c()
smedians = c()
for (i in 1:n) {
slopes = c()
for (j in 1:n) {
if (xx[j] != xx[i]) {
slopes = c(slopes, (yy[j] - yy[i])/(xx[j] -
xx[i]))
}
}
smedians = c(smedians, median(slopes))
}
slope = median(smedians)
slope
}
# Custom function works with test dataframe and a single named dependent variable but "group_by" seems to be ignored:
test_data %>% group_by (Sample_Type) %>% medianSlope( formula(paste("J", "~", idpVar)) ,.)
暂时将分组问题放在一边,我尝试通过生成多个公式的列表来进行“总结”:
粘贴(respVar, "~", idpVar) [1]“B~A”“C~A”“D~A”“E~A”“F~A”“G~A”“H~A”“I~A”“J~A”“K 〜A”“L〜A”
然而
test_data %>% summarise_at (respVar, medianSlope(paste(respVar, "~", idpVar), .))
错误:$ 运算符对原子向量无效
test_data %>% summarise_at (respVar, medianSlope(paste(get(respVar), "~", get(idpVar)), .))
get(idpVar) 中的错误:找不到对象“A”
我对 R 比较陌生,有点迷茫。你能帮忙吗?
【问题讨论】: