【发布时间】:2013-10-24 18:49:07
【问题描述】:
在给定列子集和i 中的条件的情况下,我想从 data.table 中获取唯一行。最好的方法是什么? (就计算速度和简短或可读的语法而言,“最好”)
set.seed(1)
jk <- data.table(c1 = sample(letters,60,replace = TRUE),
c2 = sample(c(TRUE,FALSE),60, replace = TRUE),
c3 = sample(letters,60, replace = TRUE),
c4 = sample.int(10,60, replace = TRUE)
)
假设我想找到c1 和c2 的唯一组合,其中c4 是10。我可以想到几种方法,但不确定哪种方法是最佳的。要提取的列是否带键也很重要。
## works but gives an extra column
jk[c4 >= 10, TRUE, keyby = list(c1,c2)]
## this removes extra column
jk[c4 >= 10, TRUE, keyby = list(c1,c2)][,V1 := NULL]
## this seems like it could work
## but no j-expression with a keyby throws an error
jk[c4 >= 10, , keyby = list(c1,c2)]
## using unique with .SD
jk[c4 >= 10, unique(.SD), .SDcols = c("c1","c2")]
【问题讨论】:
-
就清晰度而言:
unique(jk[c4 >= 10, list(c1, c2)])似乎名列前茅。
标签: r unique data.table