如何在R中按组对变量求和答案

【问题标题】：How to sum variables by groups in R如何在R中按组对变量求和
【发布时间】：2021-06-17 22:12:36
【问题描述】：

我有一个名为“SpatialKey”的数据框，包含三列。第一列包含代表人口五分位数的 5 个类别。第二列有4种数据：0、400、800和1200。第三列代表人口。

例如

quintile	isocrona	total
4	1200	1674
1	400	1676
4	400	1723
5	800	1567
3	0	1531
3	1200	1370
2	1200	1925
1	400	1916
5	0	1776
2	800	1896
3	800	2143
5	400	2098
4	400	1496
1	0	961
4	800	1684

我想按五分位数对数据进行分类，并按第二列中的 4 种数据对总体进行求和。例如：

	0	400	800	1200
1	961	3592	0	0
2	0	0	1896	1925
3	1531	0	2143	1370
4	0	3219	1684	1674
5	1776	2098	1567	0

这是我的代码。

po <- SpatialKey %>%
group_by(quintile, isocrona) %>%
summarise_at(vars(contains("total")), sum)
final_df <- as.data.frame(t(po))

但是R给了我下表：

	V1	V2	V3	V4	V5	V6	V7	V8	V9	V10	V11	V12	V13	V14	V15	V16	V17	V18	V19	V20
quintile	1	1	1	1	2	2	2	2	3	3	3	3	4	4	4	4	5	5	5	5
isocrona	0	400	800	1200	0	400	800	1200	0	400	800	1200	0	400	800	1200	0	400	800	1200
total	961	3592	0	0	0	0	1896	1925	1531	0	2143	1370	0	3219	1684	1674	1776	2098	1567	0

我将如何在 R 中做第二张表？

【问题讨论】：

标签： r database dataframe

【解决方案1】：

使用 xtab。将要求和的变量放在公式的左侧，将其他变量放在右侧。我们可以用点来表示所有的其余部分。没有使用任何包。

xtabs(total ~., SpatialKey)

给出这个 xtabs 表：

        isocrona
quintile    0  400  800 1200
       1  961 3592    0    0
       2    0    0 1896 1925
       3 1531    0 2143 1370
       4    0 3219 1684 1674
       5 1776 2098 1567    0

注意

可重现形式的输入是：

SpatialKey <- structure(list(quintile = c(4L, 1L, 4L, 5L, 3L, 3L, 2L, 1L, 5L, 
2L, 3L, 5L, 4L, 1L, 4L), isocrona = c(1200L, 400L, 400L, 800L, 
0L, 1200L, 1200L, 400L, 0L, 800L, 800L, 400L, 400L, 0L, 800L), 
    total = c(1674L, 1676L, 1723L, 1567L, 1531L, 1370L, 1925L, 
    1916L, 1776L, 1896L, 2143L, 2098L, 1496L, 961L, 1684L)), 
    class = "data.frame", row.names = c(NA, -15L))

【讨论】：

【解决方案2】：

这里我们需要一个pivot_wider 来重塑为“宽”格式，同时执行sum

library(dplyr)
library(tidyr)
SpatialKey %>%
    arrange(quintile, isocrona) %>%
    pivot_wider(names_from = isocrona, values_from = total, 
        values_fn = sum, values_fill = 0)

-输出

# A tibble: 5 x 5
#  quintile   `0` `400` `800` `1200`
#     <int> <int> <int> <int>  <int>
#1        1   961  3592     0      0
#2        2     0     0  1896   1925
#3        3  1531     0  2143   1370
#4        4     0  3219  1684   1674
#5        5  1776  2098  1567      0

或者使用base R中的xtabs

xtabs(total ~ quintile + isocrona, SpatialKey)

数据

SpatialKey <- structure(list(quintile = c(4L, 1L, 4L, 5L, 3L, 3L, 2L, 1L, 5L, 
2L, 3L, 5L, 4L, 1L, 4L), isocrona = c(1200L, 400L, 400L, 800L, 
0L, 1200L, 1200L, 400L, 0L, 800L, 800L, 400L, 400L, 0L, 800L), 
    total = c(1674L, 1676L, 1723L, 1567L, 1531L, 1370L, 1925L, 
    1916L, 1776L, 1896L, 2143L, 2098L, 1496L, 961L, 1684L)), 
    class = "data.frame", row.names = c(NA, 
-15L))

【讨论】：

【解决方案3】：

一种基于group思想的方法。好处是结果仍然是数据帧格式。

长格式结果：

library(data.table)
dt.long <- setDT(SpatialKey)[,sum(total),keyby = .(quintile,isocrona)]
dt.long

   quintile isocrona   V1
 1:        1        0  961
 2:        1      400 3592
 3:        2      800 1896
 4:        2     1200 1925
 5:        3        0 1531
 6:        3      800 2143
 7:        3     1200 1370
 8:        4      400 3219
 9:        4      800 1684
10:        4     1200 1674
11:        5        0 1776
12:        5      400 2098
13:        5      800 1567

宽幅结果：

dcast(dt.long,quintile ~ isocrona,fill = 0,value.var = "V1")

   quintile    0  400  800 1200
1:        1  961 3592    0    0
2:        2    0    0 1896 1925
3:        3 1531    0 2143 1370
4:        4    0 3219 1684 1674
5:        5 1776 2098 1567    0

数据：

SpatialKey <- structure(list(quintile = c(4L, 1L, 4L, 5L, 3L, 3L, 2L, 1L, 5L, 
2L, 3L, 5L, 4L, 1L, 4L), isocrona = c(1200L, 400L, 400L, 800L, 
0L, 1200L, 1200L, 400L, 0L, 800L, 800L, 400L, 400L, 0L, 800L), 
    total = c(1674L, 1676L, 1723L, 1567L, 1531L, 1370L, 1925L, 
    1916L, 1776L, 1896L, 2143L, 2098L, 1496L, 961L, 1684L)), 
    class = "data.frame", row.names = c(NA, 
-15L))

【讨论】：