【问题标题】:Compute relative frequencies with group totals using dplyr使用 dplyr 计算具有组总数的相对频率
【发布时间】:2025-11-23 22:20:05
【问题描述】:

我有以下玩具数据:

data <- structure(list(value = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L), class = structure(c(1L, 1L, 1L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("A", 
"B"), class = "factor")), .Names = c("value", "class"), class = "data.frame", row.names = c(NA, 
-16L))

使用命令:

data <- table(data$class, data$value)
data <- as.data.frame(data)
data$rel_freq <- data$Freq / aggregate(Freq ~ Var1, FUN = sum, data = data)$Freq

我为每个类中的每个值计算适当的相对频率:

> data
  Var1 Var2 Freq  rel_freq
1    A    1    3 0.2727273
2    B    1    3 0.6000000
3    A    2    4 0.3636364
4    B    2    2 0.4000000
5    A    3    4 0.3636364
6    B    3    0 0.0000000

我想知道如何构造等效的dplyr 管道。下面粘贴是我的尝试:

library(dplyr)
data %>%
  group_by(value, class) %>%
  summarise(n = n()) %>%
  complete(class, fill = list(n = 0)) %>%
  mutate(freq = n / sum(n))

我计算每个值的相对频率,但不幸的是,对于每对类(而不是组总数)分别计算:

Source: local data frame [6 x 4]
Groups: value [3]

  value  class     n      freq
  <int> <fctr> <dbl>     <dbl>
1     1      A     3 0.5000000
2     1      B     3 0.5000000
3     2      A     4 0.6666667
4     2      B     2 0.3333333
5     3      A     4 1.0000000
6     3      B     0 0.0000000

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    您只需按class 分组即可计算频率,因此请删除value 分组:

    data %>%
        group_by(value, class) %>%
        summarise(n = n()) %>%
        complete(class, fill = list(n = 0)) %>%
        group_by(class) %>%
        mutate(freq = n / sum(n))
    # A tibble: 6 x 4
      value  class     n      freq
      <int> <fctr> <dbl>     <dbl>
    1     1      A     3 0.2727273
    2     1      B     3 0.6000000
    3     2      A     4 0.3636364
    4     2      B     2 0.4000000
    5     3      A     4 0.3636364
    6     3      B     0 0.0000000
    

    【讨论】: