data.table 包的开发版本 1.10.5(请参阅 here for installation instructions)具有三个新函数,用于计算可在此处使用的各种级别的聚合。
请注意,OP 的预期结果包含连续的行号 1 到 15,这表明 OP 需要一个 data.frame 或 data.table,而不是Frank 首选的列表。但是,我们将在下面展示一个 data.table 也可以以一种对眼睛友好的方式打印。
rollup()
使用新的rollup() 功能和Reg 订购
library(data.table) # development version 1.10.5 as of 2015-09-10
setDT(df)
rollup(df, j = list(Pop = sum(Pop)), by = c("Reg", "Res"))[order(Reg)]
我们确实得到了
Reg Res Pop
1: A Urban 500414
2: A Rural 500501
3: A NA 1000915
4: B Urban 499922
5: B Rural 500016
6: B NA 999938
7: C Urban 501638
8: C Rural 499274
9: C NA 1000912
10: D Urban 499804
11: D Rural 499825
12: D NA 999629
13: E Urban 499917
14: E Rural 500386
15: E NA 1000303
16: NA NA 5001697
各自的总数由NA 表示(包括总计)。如果我们想更好地重现预期的结果,可以去掉总计并将NA替换为Total:
rollup(df, j = list(Pop = sum(Pop)), by = c("Reg", "Res"))[order(Reg)][
is.na(Res), Res := "Total"][!is.na(Reg)]
Reg Res Pop
1: A Urban 500414
2: A Rural 500501
3: A Total 1000915
4: B Urban 499922
5: B Rural 500016
6: B Total 999938
7: C Urban 501638
8: C Rural 499274
9: C Total 1000912
10: D Urban 499804
11: D Rural 499825
12: D Total 999629
13: E Urban 499917
14: E Rural 500386
15: E Total 1000303
请注意,Total 行出现在下方详细信息行的下方,这与 OP 的预期结果不完全一致。
groupingsets()
使用groupingsets() 函数,可以非常详细地控制聚合:
groupingsets(df, j = list(Pop = sum(Pop)), by = c("Reg", "Res"),
sets = list("Reg", c("Reg", "Res")))[order(Reg)][
is.na(Res), Res := "Total"][]
Reg Res Pop
1: A Total 1000915
2: A Urban 500414
3: A Rural 500501
4: B Total 999938
5: B Urban 499922
6: B Rural 500016
7: C Total 1000912
8: C Urban 501638
9: C Rural 499274
10: D Total 999629
11: D Urban 499804
12: D Rural 499825
13: E Total 1000303
14: E Urban 499917
15: E Rural 500386
现在,Total 行显示在详细信息行上方,根本没有创建总计。
印刷精美的“经典”data.table 解决方案
到目前为止,Psidom 和 Hack-R 发布了两个“经典”data.table 解决方案。
两者都可以更简洁地重写为
rbind(df[, .(Res = "Total", Pop = sum(Pop)), by = Reg], df)[order(Reg)]
结果可以以“眼睛友好”的方式打印,组之间使用空白行
rbind(df[, .(Res = "Total", Pop = sum(Pop)), by = Reg], df)[
order(Reg), {print(data.table(Reg, .SD), row.names = FALSE); cat("\n")}, by = Reg]
Reg Res Pop
A Total 1000915
A Urban 500414
A Rural 500501
Reg Res Pop
B Total 999938
B Urban 499922
B Rural 500016
Reg Res Pop
C Total 1000912
C Urban 501638
C Rural 499274
Reg Res Pop
D Total 999629
D Urban 499804
D Rural 499825
Reg Res Pop
E Total 1000303
E Urban 499917
E Rural 500386