【发布时间】:2014-04-04 04:34:31
【问题描述】:
我正在尝试找出关联交易。从第一个 TRUE 到最后一个 TRUE,它被认为是一个事务,并在事务中找出 tpt_mode 是混合的还是纯的。然后,插入一个包含新数据的新列,但目前 for 循环正在处理少量数据,当涉及大量数据时,它往往运行得非常慢。如何优化 for 循环以加快性能?
firstid<-1
currTpt <- 'NA'
count<-0
n <- nrow(tnx)
for (i in 1:n) {
if(tnx$FIRST[i]){
firstid<-i
currTpt <-tnx$mode[i]
count <-1
}
else{
count <- count + 1
}
if(as.character(tnx$mode[i])!= as.character(currTpt)){
currTpt <- 'both'
}
if(tnx$LAST[i])
{
tnx$final_end_loc[firstid]<-tnx$end_loc[i]
tnx$final_end_date[firstid]<-as.character(tnx$end_date[i])
tnx$final_end_time[firstid]<-as.character(tnx$end_time[i])
tnx$final_mode[firstid]<-as.character(currTpt)
tnx$final_count[firstid] <- count
}
}
final_tnx<-subset(tnx,FIRST==TRUE,c("id","start_date","start_time","final_end_date","final_end_time","start_loc","final_end_loc","final_mode","final_count"))
示例数据:编辑
tnx<- data.frame(
id=c("A","A","A","A","C","C","D","D","E"),
mode=c("on","on","off","on","on","off","off","off","on"),
start_time=c("8:20:22","17:20:22","17:45:22","18:20:22","16:35:22","17:20:22","15:20:22","16:00:22","12:20:22"),
end_time=c("8:45:22","17:30:22","18:00:22","18:30:22","17:00:22","17:50:22","15:45:22","16:14:22","27:50:22"),
start_loc=c("12","12","207","12","11","65","222","32","12"),
end_loc=c(31,31,29,11,22,12,45,31,11),
start_date=c("6/3/2012","6/3/2012","6/3/2012","6/3/2012","6/3/2012","6/3/2012","6/3/2012","6/3/2012","6/3/2012"),
end_date=c("6/3/2012","6/3/2012","6/3/2012","6/3/2012","6/3/2012","6/3/2012","6/3/2012","6/3/2012","6/3/2012"),
FIRST=c(T,T,F,F,T,F,T,F,T),
LAST=c(T,F,F,T,F,T,F,T,T)
)
图片形式的样本数据集:
预期结果:
提前致谢。
【问题讨论】:
-
什么是 f?它不见了。
-
我很确定这可以在一行中完成,但我无法弄清楚您要做什么。你能解释一下并假装我们不知道什么是连接交易......?
-
抱歉,打错了。我已经修改了。
-
为什么把同一个问题删了再发?原问题:stackoverflow.com/questions/22852046/optimize-r-for-loop
-
Close votes 或 down votes 意味着您应该改进问题,您已经这样做了。关闭投票将过期。如果您改进了问题,并且关闭不再合适,则可能不会累积更多的关闭投票。这次太晚了,因为您已经打开了一个完全重复的问题,如果您取消删除另一个问题,肯定会关闭另一个问题,但是下一次,改进并且不要关闭。你在删除和重新发布之前做得很好。
标签: r performance loops optimization for-loop