我们也可以使用data.table。将“data.frame”转换为“data.table”(setDT(mydf)),使用as.Date将“date_follow_up”的类更改为Date,按“id”分组,并通过获取的累积和创建的分组变量逻辑向量(event == "healthy"),我们得到第一个“生病”“事件”的“date_follow_up”与第一个“date_follow_up”(这将是“健康”)的区别if有any“生病”该特定组中的“事件”或else 返回“NA”。
library(data.table)
setDT(mydf)[, date_follow_up := as.Date(date_follow_up, "%m/%d/%y")
][, foo := if(any(event == "sick"))
as.integer(date_follow_up[which(event=="sick")[1]] -
date_follow_up[1] )
else NA_integer_ ,
by = .(grp= cumsum(event == "healthy"), id)]
然后,我们可以将所有不“健康”的“事件”的“foo”更改为“NA”。
mydf[event!= "healthy", foo := NA_integer_]
mydf
# id event date_follow_up foo
# 1: 1 healthy 2015-04-01 3
# 2: 1 2015-04-02 NA
# 3: 1 2015-04-03 NA
# 4: 1 sick 2015-04-04 NA
# 5: 1 sick 2015-04-05 NA
# 6: 2 2015-04-01 NA
# 7: 2 healthy 2015-04-02 NA
# 8: 2 2015-04-03 NA
# 9: 2 2015-04-04 NA
#10: 2 2015-04-05 NA
#11: 3 2015-04-01 NA
#12: 3 healthy 2015-04-02 1
#13: 3 sick 2015-04-03 NA
#14: 3 2015-04-04 NA
#15: 3 2015-04-05 NA
#16: 4 2015-04-01 NA
#17: 4 healthy 2015-04-02 3
#18: 4 2015-04-03 NA
#19: 4 2015-04-04 NA
#20: 4 sick 2015-04-05 NA
#21: 4 sick 2015-04-06 NA
#22: 4 2015-04-07 NA
#23: 4 healthy 2015-04-08 2
#24: 4 2015-04-09 NA
#25: 4 sick 2015-04-10 NA
注意:在这里,我准备了数据,其中对于特定的“id”可能存在多个“健康/生病”“事件”。
数据
mydf <- structure(list(id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3,
3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4), event = c("healthy", "",
"", "sick", "sick", "", "healthy", "", "", "", "", "healthy",
"sick", "", "", "", "healthy", "", "", "sick", "sick", "", "healthy",
"", "sick"), date_follow_up = c("4/1/15", "4/2/15", "4/3/15",
"4/4/15", "4/5/15", "4/1/15", "4/2/15", "4/3/15", "4/4/15", "4/5/15",
"4/1/15", "4/2/15", "4/3/15", "4/4/15", "4/5/15", "4/1/15", "4/2/15",
"4/3/15", "4/4/15", "4/5/15", "4/6/15", "4/7/15", "4/8/15", "4/9/15",
"4/10/15")), .Names = c("id", "event", "date_follow_up"), row.names = c(NA,
25L), class = "data.frame")