【发布时间】:2014-11-28 17:40:50
【问题描述】:
我正在处理一个包含 300 万行和 10 列的数据框,并且我正在对其进行一些子集化。我在下面有一些玩具代码。当我进行子集化时,它需要很长时间。如果我在 data.table 上使用 data.table 和 subset 会更快吗?这是一些玩具代码:
s<-c(100,100,100,800,800,6662,33565,265653262,266532)
p<-c(5,5,5,10,10,10,8,9,10)
name<-c("bob","bob","bob","ed","ed","ed","joe","frank","ted")
time<- as.POSIXct(as.character(c("2014-10-27 18:11:36 PDT","2014-10-27 18:11:37 PDT","2014-10-27 18:11:38 PDT","2014-10-27 18:11:39 PDT","2014-10-27 18:11:40 PDT","2014-10-27 18:11:41 PDT","2014-10-27 19:11:36 PDT","2014-10-27 20:11:36 PDT","2014-10-27 21:11:36 PDT")))
dat<- data.frame(s,p,name,time)
dat
这是数据框:
s p name time
1 100 5 bob 2014-10-27 18:11:36
2 100 5 bob 2014-10-27 18:11:37
3 100 5 bob 2014-10-27 18:11:38
4 800 10 ed 2014-10-27 18:11:39
5 800 10 ed 2014-10-27 18:11:40
6 6662 10 ed 2014-10-27 18:11:41
7 33565 8 joe 2014-10-27 19:11:36
8 265653262 9 frank 2014-10-27 20:11:36
9 266532 10 ted 2014-10-27 21:11:36
现在我在数据框上设置子集:
result <- subset(dat, as.numeric(s) == 100
& p == 5
& name == "bob"
& time >= "2014-10-27 18:11:36 PDT"
& time <= "2014-10-27 18:12:00 PDT"
)
result
s p name time
1 100 5 bob 2014-10-27 18:11:36
2 100 5 bob 2014-10-27 18:11:37
3 100 5 bob 2014-10-27 18:11:38
如何使用 data.table 做类似的事情?
谢谢。
【问题讨论】:
标签: r data.table