【发布时间】:2015-08-25 08:40:11
【问题描述】:
我有一个data.table 的逻辑值如下:
library(data.table)
set.seed(1)
myDt <- data.table(id = paste0("id", 1:10))
myDt[, paste0(letters[1:3], sample(1:10, 9, replace = FALSE)) :=
lapply(1:9, function(i) sample(c(TRUE, FALSE), 10, replace = TRUE))]
myDt
id a3 b4 c5 a7 b2 c8 a9 b6 c10
1: id1 TRUE FALSE TRUE TRUE FALSE TRUE FALSE FALSE TRUE
2: id2 TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE TRUE
3: id3 TRUE TRUE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
4: id4 FALSE FALSE TRUE FALSE TRUE TRUE TRUE TRUE FALSE
5: id5 TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE
6: id6 FALSE TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE
7: id7 TRUE TRUE FALSE FALSE TRUE TRUE FALSE TRUE FALSE
8: id8 FALSE TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE
9: id9 FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE
10: id10 TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
除id 之外的列是三个类别(a、b 和 c),每个类别有 3 个重复(整数)。我需要在事先不知道重复数的情况下计算每个类别的逻辑值。
我可以获取类别a 的列,如下所示:
aCols <- grep("^a", names(myDt), value = TRUE)
myDt[, .SD, .SDcols = aCols, by = id]
id a3 a7 a9
1: id1 TRUE TRUE FALSE
2: id2 TRUE FALSE TRUE
3: id3 TRUE FALSE FALSE
4: id4 FALSE FALSE TRUE
5: id5 TRUE FALSE TRUE
6: id6 FALSE FALSE TRUE
7: id7 TRUE FALSE FALSE
8: id8 FALSE TRUE FALSE
9: id9 FALSE TRUE TRUE
10: id10 TRUE FALSE FALSE
但是当我试图计算逻辑值时我被卡住了。到目前为止,我已经尝试过:
myDt[, sum(.SD), .SDcols = aCols, by = id]
Error in gsum(.SD) :
GForce sum can only be applied to columns, not .SD or similar. To sum all items in a list such as .SD, either add the prefix base::sum(.SD) or turn off GForce optimization using options(datatable.optimize=1). More likely, you may be looking for 'DT[,lappy(.SD,sum),by=,.SDcols=]'
和
myDt[, base::sum(.SD), .SDcols = aCols, by = id]
Error in FUN(X[[i]], ...) :
only defined on a data frame with all numeric variables
我确实用数字而不是逻辑尝试了后一种代码,它给了我预期的结果。
如果有任何建议,我将不胜感激。感谢阅读!
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS
locale:
[1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C LC_TIME=en_AU.UTF-8
[4] LC_COLLATE=en_AU.UTF-8 LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
[7] LC_PAPER=en_AU.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.9.4
loaded via a namespace (and not attached):
[1] magrittr_1.5 plyr_1.8.3 tools_3.2.2 reshape2_1.4.1 Rcpp_0.12.0 stringi_0.5-5
[7] stringr_1.0.0 chron_2.3-47
【问题讨论】:
标签: r data.table