【问题标题】:Creating two columns of cumulative sum based on the categories of one column根据一列的类别创建两列累积和
【发布时间】:2021-05-19 03:46:31
【问题描述】:

我喜欢在分配列中创建两个累积频率为“A”和“B”的列。

df = data.frame(id = 1:10, assignment= c("B","A","B","B","B","A","B","B","A","B"))

            id assignment
        1   1          B
        2   2          A
        3   3          B
        4   4          B
        5   5          B
        6   6          A
        7   7          B
        8   8          B
        9   9          A
        10 10          B

结果表将具有这种格式

            id  assignment  A   B
        1   1   B           0   1
        2   2   A           1   1
        3   3   B           1   2
        4   4   B           1   3
        5   5   B           1   4
        6   6   A           2   4
        7   7   B           2   5
        8   8   B           2   6
        9   9   A           3   6
       10   10  B           3   7

如何概括超过 2 个类别的代码(比如“A”、“B”、C”)? 谢谢

【问题讨论】:

    标签: r cumulative-frequency


    【解决方案1】:

    基本 R 选项

    transform(
      df,
      A = cumsum(assignment == "A"),
      B = cumsum(assignment == "B")
    )
    

    给予

       id assignment A B
    1   1          B 0 1
    2   2          A 1 1
    3   3          B 1 2
    4   4          B 1 3
    5   5          B 1 4
    6   6          A 2 4
    7   7          B 2 5
    8   8          B 2 6
    9   9          A 3 6
    10 10          B 3 7
    

    【讨论】:

      【解决方案2】:

      我们可以使用model.matrixcolCumsums

      library(matrixStats)
      cbind(df, colCumsums(model.matrix(~ assignment - 1, df[-1])))
      

      【讨论】:

      • 使用model.matrix 的解决方案真是太棒了!
      【解决方案3】:

      assignment 中使用lapply 而非unique 值来创建新列。

      vals <- sort(unique(df$assignment))
      df[vals] <- lapply(vals, function(x) cumsum(df$assignment == x))
      df
      
      #   id assignment A B
      #1   1          B 0 1
      #2   2          A 1 1
      #3   3          B 1 2
      #4   4          B 1 3
      #5   5          B 1 4
      #6   6          A 2 4
      #7   7          B 2 5
      #8   8          B 2 6
      #9   9          A 3 6
      #10 10          B 3 7
      

      【讨论】:

      • 运行代码时出现以下错误“Summary.factor(1:2, na.rm = FALSE) 中的错误:‘min’对因子没有意义”
      • @user15219127 看来您使用的是 R data.frame 创建中添加stringsAsFactors = FALSE。使用df = data.frame(id = 1:10, assignment= c("B","A","B","B","B","A","B","B","A","B"), stringsAsFactors = FALSE)
      猜你喜欢
      • 1970-01-01
      • 2021-09-02
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-11-28
      • 2021-06-05
      • 2020-09-29
      • 2011-11-29
      相关资源
      最近更新 更多