如何在R中包含使用聚合（）聚合的行数[重复]答案

【问题标题】：How to include number of rows aggregated using aggregate() in R [duplicate]如何在R中包含使用聚合（）聚合的行数[重复]
【发布时间】：2021-09-17 04:39:06
【问题描述】：

我的数据集包含一个 parentID 变量和一个 childIQ 变量，它代表特定父母的孩子的智商：

df <- data.frame("parentID"=c(101,101,101,204,204,465,465),
  "childIQ"=c(98,90,81,96,87,71,65))

parentID, childIQ
101, 98
101, 90
101, 81
204, 96
204, 87
465, 71
465, 65

我运行了一个 aggregate() 函数，因此每个父级只有 1 行，childIQ 值成为该父级子级的平均 IQ：

df_agg <- aggregate(childIQ ~ parentID , data = df, mean)

parentID, avg_childIQ
101, 89.67
204, 91.5
465, 68

但是，我想添加另一个表示该父级的子级数量的列，如下所示：

parentID, avg_childIQ, num_children
101, 90.67, 3
204, 91.5, 2
465, 68, 2

一旦我已经创建了df_agg，我不确定如何使用 data.table 来做到这一点？

【问题讨论】：

标签： r data.table aggregate transform

【解决方案1】：

可以使用function(x) c(...) 代码为aggregate 提供多个功能。

df_agg <- aggregate(childIQ ~ parentID , data = df,
                    function(x) c(mean = mean(x), 
                                  n = length(x)))

#>   parentID childIQ.mean childIQ.n
#> 1      101     89.66667   3.00000
#> 2      204     91.50000   2.00000
#> 3      465     68.00000   2.00000

【讨论】：

【解决方案2】：

使用 dplyr：

library(dplyr)
df %>% group_by(parentID) %>% summarise(avg_childID = mean(childIQ), num_children = n())
# A tibble: 3 x 3
  parentID avg_childID num_children
     <dbl>       <dbl>        <int>
1      101        89.7            3
2      204        91.5            2
3      465        68              2

使用data.table：

library(data.table)
setDT(df)[,list(avg_childID = mean(childIQ), num_children = .N), by=parentID]
   parentID avg_childID num_children
1:      101    89.66667            3
2:      204    91.50000            2
3:      465    68.00000            2

【讨论】：

看起来不错，谢谢！是否可以使用 data.table 生成相同的表？
@codemachino，已添加 data.table 代码。