【问题标题】:Convert plyr::ddply to dplyr将 plyr::ddply 转换为 dplyr
【发布时间】:2019-11-25 22:13:52
【问题描述】:

我有一个这样的数据框:

tmp <- read.table(header = T, text = "gene_id   gene_symbol ensembl_id  keep val1   val2    val3
x   a   Multiple    Yes 1   2   3
                  x1    a   Multiple    No  2   3   4
                  x2    a   Multiple    No  1   4   3
                  y b   Multiple    Yes 22  20  12
                  y1    b   Multiple    No  98  7   97
                  y2    b   Multiple    No  8   76  6")

我正在尝试按 gene_symbol 变量进行分组,并计算 keep == "Yes" 的每一行与所有其他行 (keep == "No") 之间的相关性,并返回平均相关性以及 gene_symbolgene_id。这是函数:

# function to calculate avg. correlation
calc.mean.corr <- function(x){
  gene.id <- x[which(x$keep == "Yes"),"gene_id"]
  x1 <- x %>% 
    filter(keep == "Yes") %>%
    select(-c(gene_id, gene_symbol, ensembl_id, keep)) %>%
    as.numeric()
  x2 <- x %>% 
    filter(keep == "No") %>%
    select(-c(gene_id, gene_symbol, ensembl_id, keep))

  # correlation of kept id with discarded ids
  cor <- mean(apply(x2, 1, FUN = function(y) cor(x1, y)))
  cor <- round(cor, digits = 2)
  df <- data.frame(avg.cor = cor, gene_id = gene.id)
  return(df)
}

# call using ddply
for.corr <- plyr::ddply(tmp, .variables = "gene_symbol", .fun = function(x) calc.mean.corr(x))

最终输出如下:

> for.corr
  gene_symbol avg.cor gene_id
1           a    0.83       x
2           b    0.02       y

我为此使用plyr::ddply,但想改用dplyr。但是,我不确定如何将其转换为 dplyr 格式。任何帮助将非常感激。

【问题讨论】:

  • 函数中的列名可以有多个参数吗?

标签: r dplyr plyr


【解决方案1】:

如果我们不想更改函数,可以选择执行group_split 并应用函数

library(dplyr)
library(purrr)
tmp %>%
   group_split(gene_symbol) %>%
   map_dfr(calc.mean.corr)

要包含gene_symbol

tmp %>%
    split(.$gene_symbol) %>%
    map_dfr(~ calc.mean.corr(.), .id = 'gene_symbol')
#    gene_symbol avg.cor gene_id
#1           a    0.83       x
#2           b    0.02       y

【讨论】:

    猜你喜欢
    • 2016-01-09
    • 1970-01-01
    • 2014-11-05
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-05-21
    • 2014-11-03
    • 1970-01-01
    相关资源
    最近更新 更多