根据另一列的唯一值计算两个值出现在列中的次数[重复]答案

【问题标题】：Count the number of times two values appear in a column based on the unique values of another column [duplicate]根据另一列的唯一值计算两个值出现在列中的次数[重复]
【发布时间】：2018-12-11 19:25:01
【问题描述】：

我有下面的数据框：

year<-c("2000","2000","2001","2002","2000")
gender<-c("M","F","M","F","M")
YG<-data.frame(year,gender)

在这个数据框中，我想计算每年“M”和“F”的数量，然后创建一个新的数据框，例如：

我尝试了类似的方法：

library(dplyr)
ns<-YG %>%
  group_by(year) %>%
  count(YG$gender == "M")

【问题讨论】：

也许table(YG) ?
避免在管道count(gender == "M")中使用$

标签： r dplyr

【解决方案1】：

使用reshape2的解决方案：

dcast(YG, year~gender)

  year F M
1 2000 1 2
2 2001 0 1
3 2002 1 0

或不同的tidyverse 解决方案：

YG %>%
 group_by(year) %>%
 summarise(M = length(gender[gender == "M"]),
           F = length(gender[gender == "F"]))

  year      M     F
  <fct> <int> <int>
1 2000      2     1
2 2001      1     0
3 2002      0     1

或者按照@zx8754的建议：

YG %>%
 group_by(year) %>%
 summarise(M = sum(gender == "M"),
           F = sum(gender == "F"))

【讨论】：

也许只是 M = sum(gender == “M”) ?
@zx8754 确实很简单，添加到我的帖子中。谢谢。

【解决方案2】：

我们可以使用count和spread获取df格式，在spread中使用fill = 0补0：

library(tidyverse)
YG %>%
  group_by(year) %>%
  count(gender) %>%
  spread(gender, n, fill = 0)

输出：

# A tibble: 3 x 3
# Groups:   year [3]
  year      F     M
  <fct> <dbl> <dbl>
1 2000      1     2
2 2001      0     1
3 2002      1     0

【讨论】：

不需要last mutate_all，只需在spread 中使用fill = 0。
@zx8754 谢谢！我总是忘记这个选项。