【问题标题】:sum by condition in data.table in RR中data.table中的条件求和
【发布时间】:2020-09-12 20:12:05
【问题描述】:

示例数据表:

example <- data.table(name=c('black','black','black','red','red'),
                 type=c('chair','chair','sofa','sofa','plate'),
                 num=c(4,5,12,4,3), 
                 cost = c(20,22,219,17,4))

我想总结一下这个data.table。对于每个名字,我想知道有多少项目。然后我也有兴趣知道每种颜色的椅子、沙发和盘子的成本。所以我会得到:

example <- data.table(name=c('black','red'),
                      count=c(3,2),
                      chair = c(44,0), plate = c(0,4), NOsofa = c(219,17))

我可以得到计数:

example[,.(count = .N), by="name"]

但苦于如何创建剩余的列?

【问题讨论】:

    标签: r count data.table


    【解决方案1】:

    您可以先为每个name 添加count

    library(data.table)
    example[,count := .N, name]
    

    然后sum 成本和重塑

    dcast(example[, .(cost  = sum(cost)), .(name, type, count)], 
              name + count~type, value.var = 'cost', fill = 0)
    
    
    #    name count chair plate sofa
    #1: black     3    42     0  219
    #2:   red     2     0     4   17
    

    使用tidyverse 可以这样做:

    library(dplyr)
    
    example %>%
      group_by(name) %>%
      mutate(count = n()) %>%
      group_by(type, count, add = TRUE) %>%
      summarise(cost = sum(cost)) %>%
      tidyr::pivot_wider(names_from = type, values_from = cost, 
             names_prefix = 'NO', values_fill = list(cost = 0))
    

    【讨论】:

      【解决方案2】:

      我们可以利用dcast中的fun.aggregate进行join

      library(data.table)
      dcast(example, name ~ type, value.var = 'cost', sum)[example[,
           .(count = .N), name], on = .(name)]
      #    name chair plate sofa count
      #1: black    42     0  219     3
      #2:   red     0     4   17     2
      

      【讨论】:

        猜你喜欢
        • 2018-08-13
        • 1970-01-01
        • 2016-11-19
        • 2015-10-06
        • 2013-08-09
        • 2014-09-30
        • 1970-01-01
        • 1970-01-01
        • 2015-04-29
        相关资源
        最近更新 更多