【发布时间】:2025-12-13 00:45:01
【问题描述】:
我有两个数据框,trips 是具有唯一 id 的自行车所进行的独特行程,intervals 则每 10 分钟显示一次自行车 ID 的位置。如果time 介于start 和finish 和bike_ids 之间,我的目标是删除 记录intervals。时间为posixCT类,原始数据帧有数十万条记录
例如,下面这两个数据集的结果应该是:
> trips
bike_id start finish
1 1 2017-11-22 15:52:36 2017-11-22 17:47:53
2 2 2017-11-22 16:05:44 2017-11-22 16:23:25
3 3 2017-11-22 16:31:06 2017-11-22 17:11:20
> intervals
time bike_id
3 2017-11-22 16:00:03 1
4 2017-11-22 16:10:03 1
5 2017-11-22 16:20:02 1
6 2017-11-22 16:30:02 1
7 2017-11-22 16:40:03 1
8 2017-11-22 16:50:02 1
9 2017-11-22 17:00:02 1
10 2017-11-22 17:10:02 1
11 2017-11-22 17:20:03 1
12 2017-11-22 17:30:03 1
13 2017-11-22 16:00:03 2
14 2017-11-22 16:10:03 2
15 2017-11-22 16:20:02 2
16 2017-11-22 16:30:02 2
17 2017-11-22 16:40:03 2
18 2017-11-22 16:50:02 2
19 2017-11-22 17:00:02 2
20 2017-11-22 17:10:02 2
21 2017-11-22 17:20:03 2
22 2017-11-22 17:30:03 2
23 2017-11-22 16:30:02 3
24 2017-11-22 16:40:03 3
25 2017-11-22 16:50:02 3
26 2017-11-22 17:00:02 3
27 2017-11-22 17:10:02 3
28 2017-11-22 17:20:03 3
29 2017-11-22 17:30:03 3
结果
> outcome
time bike_id
13 2017-11-22 16:00:03 2
16 2017-11-22 16:30:02 2
17 2017-11-22 16:40:03 2
18 2017-11-22 16:50:02 2
19 2017-11-22 17:00:02 2
20 2017-11-22 17:10:02 2
21 2017-11-22 17:20:03 2
22 2017-11-22 17:30:03 2
23 2017-11-22 16:30:02 3
28 2017-11-22 17:20:03 3
29 2017-11-22 17:30:03 3
不确定从哪里开始。任何关于从dplyr 或apply 函数开始的建议将不胜感激!
这里是示例数据:
> dput(intervals)
structure(list(time = structure(c(1511384403.94561, 1511385003.17654,
1511385602.47887, 1511386202.99895, 1511386803.18361, 1511387402.98233,
1511388002.69461, 1511388602.5818, 1511389203.52712, 1511389803.652,
1511384403.94561, 1511385003.17654, 1511385602.47887, 1511386202.99895,
1511386803.18361, 1511387402.98233, 1511388002.69461, 1511388602.5818,
1511389203.52712, 1511389803.652, 1511386202.99895, 1511386803.18361,
1511387402.98233, 1511388002.69461, 1511388602.5818, 1511389203.52712,
1511389803.652), class = c("POSIXct", "POSIXt"), tzone = ""),
bike_id = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3)), .Names = c("time",
"bike_id"), row.names = 3:29, class = "data.frame")
> dput(trips)
structure(list(bike_id = c(1, 2, 3), start = structure(c(1511383956,
1511384744, 1511386266), class = c("POSIXct", "POSIXt"), tzone = ""),
finish = structure(c(1511390873, 1511385805, 1511388680), class = c("POSIXct",
"POSIXt"), tzone = "")), .Names = c("bike_id", "start", "finish"
), row.names = c(NA, 3L), class = "data.frame")
【问题讨论】:
-
使用 %between% ..
-
您能否提供一个包含
%between%和两个单独数据框的简短示例? ?
标签: r dplyr data.table subset posixct