【问题标题】:How to apply a function to a data.table subset by multiple columns in R?如何通过 R 中的多列将函数应用于 data.table 子集?
【发布时间】:2020-08-11 04:33:19
【问题描述】:

我有一个数据表,其中包含多个组的更改计数。例如:

input <- data.table(from = c("A", "A", "A", "B", "B", "B", "A", "A", "A", "B", "B", "B"),
                 to = c(letters[1:6], letters[1:6]),
                 from_N = c(100, 100, 100, 50, 50, 50, 60, 60 ,60, 80, 80, 80),
                 to_N = c(10, 20, 40, 5, 5, 15, 10, 5, 10, 20, 5, 10),
                 group = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2))

如何计算各组每次更改的总数?我可以使用 for 循环来做到这一点,例如:

out <- list()
for (i in 1:length(unique(input$from))){ 
  sub <- input[from == unique(input$from)[i]] 
  out2 <- list()
  for (j in 1:length(unique(sub$to))){
    sub2 <- sub[to == unique(sub$to)[j]]
    out2[[j]] <- data.table(from = sub2$from[1],
                  to = sub2$to[1],
                  from_N = sum(sub2$from_N),
                  to_N = sum(sub2$to_N))
    print(unique(sub$to)[j])
  }
  out[[i]] <- do.call("rbind", out2)
  print(unique(input$from)[i])
}
output <- do.call("rbind", out)

但是,我需要将其应用到的数据表非常大,因此我需要最大限度地提高性能。有 data.table 方法吗?任何帮助将不胜感激!

【问题讨论】:

    标签: r data.table subset apply


    【解决方案1】:

    dplyr 的选项

    library(dplyr)
    input %>%
     group_by(from, to) %>%
     summarise_at(vars(ends_with('_N')), sum)
    

    或在data.table

    library(data.table)
    setDT(input)[, lapply(.SD, sum),  by = .(from, to), .SDcols = patterns('_N$')]
    

    【讨论】:

      【解决方案2】:

      也许我忽略了一些东西,但似乎你只是在追求:

      library(data.table)
      
      setDT(input)[, .(from_N = sum(from_N), to_N = sum(to_N)), by = .(from, to)]
      

      输出:

         from to from_N to_N
      1:    A  a    160   20
      2:    A  b    160   25
      3:    A  c    160   50
      4:    B  d    130   25
      5:    B  e    130   10
      6:    B  f    130   25
      

      【讨论】:

        猜你喜欢
        • 2014-08-05
        • 2019-03-27
        • 1970-01-01
        • 2013-05-22
        • 1970-01-01
        • 1970-01-01
        • 2016-09-23
        • 2014-01-04
        相关资源
        最近更新 更多