【发布时间】:2019-11-25 22:13:52
【问题描述】:
我有一个这样的数据框:
tmp <- read.table(header = T, text = "gene_id gene_symbol ensembl_id keep val1 val2 val3
x a Multiple Yes 1 2 3
x1 a Multiple No 2 3 4
x2 a Multiple No 1 4 3
y b Multiple Yes 22 20 12
y1 b Multiple No 98 7 97
y2 b Multiple No 8 76 6")
我正在尝试按 gene_symbol 变量进行分组,并计算 keep == "Yes" 的每一行与所有其他行 (keep == "No") 之间的相关性,并返回平均相关性以及 gene_symbol 和 gene_id。这是函数:
# function to calculate avg. correlation
calc.mean.corr <- function(x){
gene.id <- x[which(x$keep == "Yes"),"gene_id"]
x1 <- x %>%
filter(keep == "Yes") %>%
select(-c(gene_id, gene_symbol, ensembl_id, keep)) %>%
as.numeric()
x2 <- x %>%
filter(keep == "No") %>%
select(-c(gene_id, gene_symbol, ensembl_id, keep))
# correlation of kept id with discarded ids
cor <- mean(apply(x2, 1, FUN = function(y) cor(x1, y)))
cor <- round(cor, digits = 2)
df <- data.frame(avg.cor = cor, gene_id = gene.id)
return(df)
}
# call using ddply
for.corr <- plyr::ddply(tmp, .variables = "gene_symbol", .fun = function(x) calc.mean.corr(x))
最终输出如下:
> for.corr
gene_symbol avg.cor gene_id
1 a 0.83 x
2 b 0.02 y
我为此使用plyr::ddply,但想改用dplyr。但是,我不确定如何将其转换为 dplyr 格式。任何帮助将非常感激。
【问题讨论】:
-
函数中的列名可以有多个参数吗?