使用列索引号而不是名称在 R 中聚合函数

【问题标题】：Aggregate function in R using column index numbers rather than names使用列索引号而不是名称在 R 中聚合函数
【发布时间】：2021-09-09 17:22:51
【问题描述】：

我想使用 R 中的聚合函数，使用列索引号来标识正在聚合的数据框列，而不是列名。

这是一个使用列名的示例：

df = data.frame(A = c("a", "a", "b", "b", "c", "c"), B = 1:3, C = 1:3, D = 1:3)
aggregate(cbind(B, C, D) ~ A, data = df, sum)

但我不想在 cbind 中列出 B、C 和 D，而是指示它使用 2:4 列。

【问题讨论】：

您必须减少 group_by 中的列数，然后您也可以在 dplyr 中执行 df %>% group_by(A) %>% summarise(across(1:3, sum))

标签： r aggregate

【解决方案1】：

我们可以只使用. 来指定其余的列

aggregate(. ~ A, data = df, sum)
  A B C D
1 a 3 3 3
2 b 4 4 4
3 c 5 5 5

或者如果我们想要专门的位置索引，将数据子集化并转换为matrix

aggregate(as.matrix(df[2:4]) ~ A, data = df, sum)
A B C D
1 a 3 3 3
2 b 4 4 4
3 c 5 5 5

或使用dplyr

library(dplyr)
df %>%
     group_by(A) %>%
     summarise(across(all_of(names(.)[2:4]), sum))

【讨论】：

【解决方案2】：

另一种使用列号的方法是

aggregate(df[2:4], list(grp = df[[1]]), sum)
#Or using df$A
#aggregate(df[2:4], list(grp = df$A), sum)

#  grp B C D
#1   a 3 3 3
#2   b 4 4 4
#3   c 5 5 5

【讨论】：