【发布时间】:2022-01-12 23:56:04
【问题描述】:
我有一个看起来像这样的数据框。为了简单起见,我展示了前 6 行,但总行数为 8236。等级范围为 0-2。我刚刚在下面的示例中显示了 0 级和 1 级:
Telangiectasia_time grade
<chr> <int>
1 telangiectasia_tumour_0 0
2 telangiectasia_tumour_1 0
3 telangiectasia_tumour_12 0
4 telangiectasia_tumour_24 0
5 telangiectasia_tumour_0 1
6 telangiectasia_tumour_1 1
我想按 Telangiectasia_Time(第一列)分组,然后计算每组的成绩数。因此,以前 6 行为例,它应该如下所示:
Telangiectasia_time grade0 grade1 grade2
1 telangiectasia_tumour_0 1 1 0
2 telangiectasia_tumour_1 1 1 0
3 telangiectasia_tumour_12 1 0 0
4 telangiectasia_tumour_24 1 0 0
最后有三列分别代表各个等级,每个变量的每个等级都有一个计数。我尝试使用聚合函数:
**aggregate(grade ~ Telangiectasia_time, telangiectasia_tumour_data, *sum*)**
但我不确定在括号的最后一位中放什么,以便返回每个等级的总和。当我输入总和时,它只是将数字相加,而不是将变量视为单独的(0,1 和 2)。使用我的完整数据集,我得到了错误的输出:
Telangiectasia_time grade
1 telangiectasia_tumour_0 18
2 telangiectasia_tumour_1 11
3 telangiectasia_tumour_12 38
4 telangiectasia_tumour_24 87
我也尝试过 group_by() 但这只是给了我一个总数
telangiectasia_tumour_data %>% group_by(Telangiectasia_time) %>% summarize(count =n())
Telangiectasia_time count
* <chr> <int>
1 telangiectasia_tumour_0 2059
2 telangiectasia_tumour_1 2059
3 telangiectasia_tumour_12 2059
4 telangiectasia_tumour_24 2059
【问题讨论】:
标签: r dplyr group-by aggregate