【发布时间】:2014-06-04 01:01:34
【问题描述】:
我需要sapply 根据时差是否超过某个阈值(在我的例子中,是由 for 循环设置的天数)返回一个布尔值列表。
示例数据(日期已使用as.Date 转换):
#DF called "held"
ID Result Start_Date
123 0 12/5/2013
123 0 12/12/2013
123 0 12/31/2013
123 0 4/22/2014
123 1 4/23/2014
654 0 9/3/2013
654 0 9/17/2013
98 0 10/18/2013
98 0 10/19/2013
98 2 12/20/2013
555 0 2/1/2014
555 0 3/2/2014
555 0 3/3/2014
66 1 1/12/2013
代码:
#empty vectors to be populated for plotting
a <- c()
b <- c()
for (n in 1:60){
#all rows where ID is not duplicated and Result is either 1 or 2 are FALSE
#all ID's where the difference between the min and max Start_Date (across multiple rows) exceeds the threshold are TRUE
held$CHNS <-((!(!(held$ID %in% held$ID[duplicated(held$ID) | duplicated(held$ID, fromLast = TRUE)])&(held$Result %in% c(1,2)))) & (sapply(held$ID,function(x) max(held$Start_Date[held$ID == x]) - min(held$Start_Date[held$ID == x]) > n)))
#find percentage of Results 1 and 2 in entire CHNS population
m <- length(held$Result[held$Result %in% c(1,2) & held$CHNS == TRUE])/nrow(held[held$CHNS == TRUE,])
#assign vector elements
a[n] <- n
b[n] <- m
}
当前的代码似乎是准确的,但速度极慢。有关如何改进的任何提示?我什至应该使用sapply 吗?谢谢!
【问题讨论】: