【问题标题】:Using aggregate/group_by in R to group data and give a count for each factor variable?在 R 中使用 aggregate/group_by 对数据进行分组并对每个因子变量进行计数?
【发布时间】:2022-01-12 23:56:04
【问题描述】:

我有一个看起来像这样的数据框。为了简单起见,我展示了前 6 行,但总行数为 8236。等级范围为 0-2。我刚刚在下面的示例中显示了 0 级和 1 级:

 Telangiectasia_time      grade
  <chr>                    <int>
1 telangiectasia_tumour_0      0
2 telangiectasia_tumour_1      0
3 telangiectasia_tumour_12     0
4 telangiectasia_tumour_24     0
5 telangiectasia_tumour_0      1
6 telangiectasia_tumour_1      1

我想按 Telangiectasia_Time(第一列)分组,然后计算每组的成绩数。因此,以前 6 行为例,它应该如下所示:

       Telangiectasia_time grade0    grade1    grade2 
1  telangiectasia_tumour_0    1      1          0
2  telangiectasia_tumour_1    1      1          0
3 telangiectasia_tumour_12    1      0          0
4 telangiectasia_tumour_24    1      0          0  

最后有三列分别代表各个等级,每个变量的每个等级都有一个计数。我尝试使用聚合函数:

**aggregate(grade ~ Telangiectasia_time, telangiectasia_tumour_data, *sum*)** 

但我不确定在括号的最后一位中放什么,以便返回每个等级的总和。当我输入总和时,它只是将数字相加,而不是将变量视为单独的(0,1 和 2)。使用我的完整数据集,我得到了错误的输出:

      Telangiectasia_time grade
1  telangiectasia_tumour_0    18
2  telangiectasia_tumour_1    11
3 telangiectasia_tumour_12    38
4 telangiectasia_tumour_24    87

我也尝试过 group_by() 但这只是给了我一个总数

telangiectasia_tumour_data %>% group_by(Telangiectasia_time) %>% summarize(count =n())
  Telangiectasia_time      count
* <chr>                    <int>
1 telangiectasia_tumour_0   2059
2 telangiectasia_tumour_1   2059
3 telangiectasia_tumour_12  2059
4 telangiectasia_tumour_24  2059

【问题讨论】:

    标签: r dplyr group-by aggregate


    【解决方案1】:

    使用dpylr::counttidyr::pivot_wider 你可以这样做:

    library(dplyr)
    library(tidyr)
    
    telangiectasia_tumour_data %>% 
      count(Telangiectasia_time, grade) %>% 
      pivot_wider(names_from = grade, values_from = n, names_prefix = "grade", values_fill = 0)
    #> # A tibble: 4 × 3
    #>   Telangiectasia_time      grade0 grade1
    #>   <chr>                     <int>  <int>
    #> 1 telangiectasia_tumour_0       1      1
    #> 2 telangiectasia_tumour_1       1      1
    #> 3 telangiectasia_tumour_12      1      0
    #> 4 telangiectasia_tumour_24      1      0
    

    数据

    telangiectasia_tumour_data <- structure(list(Telangiectasia_time = c(
      "telangiectasia_tumour_0",
      "telangiectasia_tumour_1", "telangiectasia_tumour_12", "telangiectasia_tumour_24",
      "telangiectasia_tumour_0", "telangiectasia_tumour_1"
    ), grade = c(
      0L,
      0L, 0L, 0L, 1L, 1L
    )), class = "data.frame", row.names = c(
      "1",
      "2", "3", "4", "5", "6"
    ))
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-03-22
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多