【发布时间】:2020-08-11 04:33:19
【问题描述】:
我有一个数据表,其中包含多个组的更改计数。例如:
input <- data.table(from = c("A", "A", "A", "B", "B", "B", "A", "A", "A", "B", "B", "B"),
to = c(letters[1:6], letters[1:6]),
from_N = c(100, 100, 100, 50, 50, 50, 60, 60 ,60, 80, 80, 80),
to_N = c(10, 20, 40, 5, 5, 15, 10, 5, 10, 20, 5, 10),
group = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2))
如何计算各组每次更改的总数?我可以使用 for 循环来做到这一点,例如:
out <- list()
for (i in 1:length(unique(input$from))){
sub <- input[from == unique(input$from)[i]]
out2 <- list()
for (j in 1:length(unique(sub$to))){
sub2 <- sub[to == unique(sub$to)[j]]
out2[[j]] <- data.table(from = sub2$from[1],
to = sub2$to[1],
from_N = sum(sub2$from_N),
to_N = sum(sub2$to_N))
print(unique(sub$to)[j])
}
out[[i]] <- do.call("rbind", out2)
print(unique(input$from)[i])
}
output <- do.call("rbind", out)
但是,我需要将其应用到的数据表非常大,因此我需要最大限度地提高性能。有 data.table 方法吗?任何帮助将不胜感激!
【问题讨论】:
标签: r data.table subset apply