通过多个分组添加变量计数答案

【问题标题】：Adding variable counts via multiple grouping通过多个分组添加变量计数
【发布时间】：2016-07-11 09:12:06
【问题描述】：

免责声明 - 标题可能具有误导性 - 我认为我没有找到解决方案的部分原因是我不完全知道要谷歌什么。

我有一个扩展格式的组级数据集；每个组（id）重复的年份和国家代码如下（手动输入）

year   country  id  v1  v2  v3
1991   20       1    1   0   0
1991   20       2    0   1   0
1991   20       3    0   0   1
1991   20       4    1   0   0
1991   20       5    1   0   0
1991   20       6    0   1   0

我想在最后添加国家年份计数作为列，所以它看起来像下面

year   country  id  v1  v2  v3   v1.count  v2.count  v3.count
1991   20       1    1   0   0       3         2         1
1991   20       2    0   1   0       3         2         1
1991   20       3    0   0   1       3         2         1
1991   20       4    1   0   0       3         2         1
1991   20       5    1   0   0       3         2         1
1991   20       6    0   1   0       3         2         1

我尝试了aggregate、count 和dplyr，但没有成功。我认为Group by and conditionally count 或Frequency count for a specific category 可能会成功，但我无法让它发挥作用。我怎样才能做到这一点？

【问题讨论】：

df$v1.count <- sum(df$v1)?
这将总结v1 中所有年份和df 中的国家/地区，不是吗？

标签： r dplyr

【解决方案1】：

我们可以使用dplyr中的mutate_each，在按“年份”和“国家”分组后

df1 %>%
   group_by(year, country) %>%
   mutate_each(funs(count = sum), v1:v3)
 #  year country    id    v1    v2    v3 v1_count v2_count v3_count
 #  <int>   <int> <int> <int> <int> <int>    <int>    <int>    <int>
 #1  1991      20     1     1     0     0        3        2        1
 #2  1991      20     2     0     1     0        3        2        1
 #3  1991      20     3     0     0     1        3        2        1
 #4  1991      20     4     1     0     0        3        2        1
 #5  1991      20     5     1     0     0        3        2        1
 #6  1991      20     6     0     1     0        3        2        1

【讨论】：

谢谢，我之前只用mutate试过，这个可以满足我的需要。

【解决方案2】：

我猜你也可以只使用mutate。

df1 <- read.table(text="year   country  id  v1  v2  v3
1991   20       1    1   0   0
1991   20       2    0   1   0
1991   20       3    0   0   1
1991   20       4    1   0   0
1991   20       5    1   0   0
1991   20       6    0   1   0", head=T, as.is=T)

df1

library(dplyr)

df1 %>% group_by(year, country) %>% 
  mutate(v1.count=sum(v1), v2.count=sum(v2), v3.count=sum(v3))
# Source: local data frame [6 x 9]
# Groups: year, country [1]

#    year country    id    v1    v2    v3 v1.count v2.count v3.count
#   (int)   (int) (int) (int) (int) (int)    (int)    (int)    (int)
# 1  1991      20     1     1     0     0        3        2        1
# 2  1991      20     2     0     1     0        3        2        1
# 3  1991      20     3     0     0     1        3        2        1
# 4  1991      20     4     1     0     0        3        2        1
# 5  1991      20     5     1     0     0        3        2        1
# 6  1991      20     6     0     1     0        3        2        1

【讨论】：

嗨，我认为它适用于sum，正如其他评论者所建议的那样，但我也想将它用于连续变量（例如ineq），而不仅仅是二进制变量。