【问题标题】:Count character occurrences based on multiple conditions in R根据 R 中的多个条件计算字符出现次数
【发布时间】:2021-07-14 19:12:05
【问题描述】:

我正在尝试根据多个条件计算数据框中多个不同字符串的出现次数。我有以下数据框(mut.total),其中包含以下信息:

   TYPE Sample    Genotype   Mutagen             Dose
1   DUP CD0001c   N2         MA                  20
2   DEL CD0001d   N2         MA                  20
3   DUP CD0030a   N2         MA                  20
4   DEL CD0035a   N2         Mechlorethamine     20
5   INV CD0035a   N2         Mechlorethamine     20
6   INV CD0035a   N2         Mechlorethamine     20
7   DUP CD0035a   N2         Mechlorethamine     20
8   DEL CD0035a   N2         Mechlorethamine     20
9   DEL CD0035c   N2         Mechlorethamine     20
10  DUP CD0035d   N2         Mechlorethamine     20

我想生成一个数据框,按诱变剂和类型显示突变总数和突变来自的样本数量(请记住,一个样本可能会产生相同类型的多个突变)。我的预期输出示例:

          Mutagen Type N.Mut   N.Sample
1              MA  DEL 1       1
2 Mechlorethamine  DEL 3       2
3              MA  DUP 2       2
4 Mechlorethamine  DUP 2       2
5 Mechlorethamine  INV 2       1

使用聚合我能够按诱变剂和类型生成突变数量,但我不知道如何添加突变来自的样本数量。

aggregate(x=mut.total$TYPE, by=list(Mutagen = mut.total$Mutagen, Type = mut.total$TYPE),
                         FUN = length)
          Mutagen Type N.Mut
1              MA  DEL 1
2 Mechlorethamine  DEL 3
3              MA  DUP 2
4 Mechlorethamine  DUP 2
5 Mechlorethamine  INV 2

【问题讨论】:

    标签: r dataframe conditional-statements


    【解决方案1】:

    data.table 方法

    library(data.table)
    DT <- fread("   TYPE Sample    Genotype   Mutagen             Dose
       DUP CD0001c   N2         MA                  20
       DEL CD0001d   N2         MA                  20
       DUP CD0030a   N2         MA                  20
       DEL CD0035a   N2         Mechlorethamine     20
       INV CD0035a   N2         Mechlorethamine     20
       INV CD0035a   N2         Mechlorethamine     20
       DUP CD0035a   N2         Mechlorethamine     20
       DEL CD0035a   N2         Mechlorethamine     20
       DEL CD0035c   N2         Mechlorethamine     20
      DUP CD0035d   N2         Mechlorethamine     20")
    
    DT[, .(N.Mut    = .N, 
           N.Sample = uniqueN(Sample)),
       by = .(Mutagen, TYPE)]
    #            Mutagen TYPE N.Mut N.Sample
    # 1:              MA  DUP     2        2
    # 2:              MA  DEL     1        1
    # 3: Mechlorethamine  DEL     3        2
    # 4: Mechlorethamine  INV     2        1
    # 5: Mechlorethamine  DUP     2        2
    

    【讨论】:

      【解决方案2】:

      这是dplyr 版本:

      library(dplyr)
      
      mut.total %>%
        group_by(Mutagen, TYPE) %>%
        summarise(N.Mut = n(), N.Sample = n_distinct(Sample))
      

      输出

        Mutagen         TYPE  N.Mut N.Sample
        <chr>           <chr> <int>    <int>
      1 MA              DEL       1        1
      2 MA              DUP       2        2
      3 Mechlorethamine DEL       3        2
      4 Mechlorethamine DUP       2        2
      5 Mechlorethamine INV       2        1
      

      【讨论】:

        【解决方案3】:

        使用collapse

        library(collapse)
        collap(mut.total, ~ Mutagen + TYPE, custom = list(fNobs = 1, fNdistinct = 2 ))
        

        数据

        mut.total <- structure(list(TYPE = c("DUP", "DEL", "DUP", "DEL", "INV", "INV", 
        "DUP", "DEL", "DEL", "DUP"), Sample = c("CD0001c", "CD0001d", 
        "CD0030a", "CD0035a", "CD0035a", "CD0035a", "CD0035a", "CD0035a", 
        "CD0035c", "CD0035d"), Genotype = c("N2", "N2", "N2", "N2", "N2", 
        "N2", "N2", "N2", "N2", "N2"), Mutagen = c("MA", "MA", "MA", 
        "Mechlorethamine", "Mechlorethamine", "Mechlorethamine", "Mechlorethamine", 
        "Mechlorethamine", "Mechlorethamine", "Mechlorethamine"), Dose = c(20L, 
        20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L)), row.names = c(NA, 
        -10L), class = "data.frame")
        

        【讨论】:

          猜你喜欢
          • 2021-10-24
          • 2022-01-14
          • 1970-01-01
          • 2013-07-24
          • 1970-01-01
          • 2014-09-05
          • 2021-03-23
          • 2021-12-31
          • 1970-01-01
          相关资源
          最近更新 更多