【发布时间】:2018-10-13 19:47:33
【问题描述】:
我有以下两个数据框:
df1 <- data.frame(ID = c("A","A","B","B","C","D","D","D","E"),
Date = as.POSIXct(c("2018-04-12 08:56:00","2018-04-13 11:03:00","2018-04-14 14:30:00","2018-04-15 03:10:00","2018-04-16 07:28:00","2018-04-17 11:17:00","2018-04-17 14:21:00","2018-04-18 09:56:00","2018-05-02 07:49:00")))
df2 <- data.frame(ID = c("A","A","A","B","C","D","D","D","D","D","E"),
Date = as.POSIXct(c("2018-04-10 07:11:00","2018-04-11 18:59:00","2018-04-12 12:37:00","2018-04-15 01:43:00","2018-04-21 09:52:00","2018-04-15 20:25:00","2018-04-17 12:33:00","2018-04-17 14:21:00","2018-04-18 10:59:00","2018-04-20 14:11:00","2018-05-01 09:50:00")))
对于 df1,我想做两件事: 首先,我想从 df2 中按 ID 查找最近的前一个日期。 其次,我想再次从 df2 按 ID 查找最近的下一个日期,而不重复值。在这两种情况下,我都不希望 df2 中的日期在 df1 中重复。
使用 data.table 包中的 roll = Inf 功能,我可以按 ID 合并前面的日期。
setDT(df1)
setDT(df2)
setkey(df1, ID, Date)
setkey(df2, ID, Date)[, PrecedingDate:=Date]
result <- df2[df1, roll=Inf]
我不确定如何将最近的下一个日期从 df2 提取到 df1,以及如何确保日期不重复。
结果应该如下:
result <- data.frame(ID = c("A","A","B","B","C","D","D","D","E"),
Date = as.POSIXct(c("2018-04-12 08:56:00","2018-04-13 11:03:00","2018-04-14 14:30:00","2018-04-15 03:10:00","2018-04-16 07:28:00","2018-04-17 11:17:00","2018-04-17 14:21:00","2018-04-18 09:56:00","2018-05-02 07:49:00")),
PrecedingDate = as.POSIXct(c("2018-04-11 18:59:00","2018-04-12 02:37:00",NA,"2018-04-15 01:43:00",NA,"2018-04-15 20:25:00","2018-04-17 14:21:00",NA,"2018-05-01 09:50:00")),
FollowingDate = as.POSIXct(c("2018-04-12 02:37:00",NA,"2018-04-15 01:43:00",NA,"2018-04-21 09:52:00","2018-04-17 12:33:00","2018-04-17 14:21:00","2018-04-18 10:59:00",NA)))
这里的任何帮助将不胜感激。
【问题讨论】:
-
如果
df2与df1具有相同的日期会发生什么?它被分类为前面或后面还是被忽略? -
在这些情况下,它应该被归类为仅在前面。
-
result中的第二个PrecedingDate和第一个FollowingDate是不正确的 imo。他们应该都是2018-04-12 12:37:00。我已经在我的回答中纠正了这一点。
标签: r data.table