考虑aggregate 和ave 的组合(保持与输入相同行数的内联聚合函数)。具体来说,使用aggregate 计算每个子组的likert 值(使用cbind 重命名列,使用公式样式更容易阅读),然后使用ave 计算每个计数占整个子组计数的比例百分比。
agg_df <- aggregate(cbind(count=some_num_col) ~ likert + subgroup, dataset, FUN=length)
agg_df$prop <- with(agg_df, count / ave(count, subgroup, FUN=sum))
agg_df
使用随机的种子数据进行演示(将替换为 OP 的数据)。下面假设 likert 是长格式,但可以从宽格式重新整形:
数据
set.seed(8302019)
dataset <- data.frame(
subgroup = sample(c("sas", "stata", "spss", "python", "r", "julia"), 500, replace=TRUE),
likert = sample(1:5, 500, replace=TRUE),
some_num_col = 1
)
head(dataset, 20)
# subgroup likert some_num_col
# 1 julia 5 1
# 2 python 1 1
# 3 spss 5 1
# 4 sas 1 1
# 5 sas 4 1
# 6 spss 2 1
# 7 r 5 1
# 8 r 5 1
# 9 r 1 1
# 10 spss 3 1
# 11 spss 4 1
# 12 sas 3 1
# 13 spss 5 1
# 14 spss 1 1
# 15 spss 2 1
# 16 sas 4 1
# 17 r 2 1
# 18 sas 4 1
# 19 sas 4 1
# 20 spss 1 1
按分组的比例
agg_df <- aggregate(cbind(count=some_num_col) ~ likert + subgroup, dataset, FUN=length)
agg_df$prop <- with(agg_df, count / ave(count, subgroup, FUN=sum))
agg_df
# likert subgroup count prop
# 1 1 julia 21 0.2359551
# 2 2 julia 16 0.1797753
# 3 3 julia 18 0.2022472
# 4 4 julia 17 0.1910112
# 5 5 julia 17 0.1910112
# 6 1 python 14 0.1891892
# 7 2 python 16 0.2162162
# 8 3 python 16 0.2162162
# 9 4 python 16 0.2162162
# 10 5 python 12 0.1621622
# 11 1 r 20 0.2061856
# 12 2 r 19 0.1958763
# 13 3 r 26 0.2680412
# 14 4 r 17 0.1752577
# 15 5 r 15 0.1546392
# 16 1 sas 18 0.1956522
# 17 2 sas 16 0.1739130
# 18 3 sas 24 0.2608696
# 19 4 sas 18 0.1956522
# 20 5 sas 16 0.1739130
# 21 1 spss 13 0.1688312
# 22 2 spss 22 0.2857143
# 23 3 spss 15 0.1948052
# 24 4 spss 16 0.2077922
# 25 5 spss 11 0.1428571
# 26 1 stata 17 0.2394366
# 27 2 stata 8 0.1126761
# 28 3 stata 16 0.2253521
# 29 4 stata 12 0.1690141
# 30 5 stata 18 0.2535211