【问题标题】:Finding difference between timestamps in R based on common id in another column根据另一列中的公共 id 查找 R 中时间戳之间的差异
【发布时间】:2017-10-26 11:10:31
【问题描述】:

给定的数据集在第三列中包含一个时间戳,该时间戳由 mm/dd/yyyy 格式的日期和 24 小时格式的 1 月时间组成。我希望通过将每一行与其前一行进行比较,仅当患者具有共同值时才使用 R 来找到分钟差“3”。这也意味着数据集的第一行应该给出 0 分钟的值,因为没有什么可比较的。谢谢,请帮忙。

patient  handling                time
1        Registration            1/2/2017 11:41
1        Triage and Assessment   1/2/2017 12:40
1        Registration            1/2/2017 12:40
1        Triage and Assessment   1/2/2017 22:32
1        Blood test              1/5/2017 8:59
1        Blood test              1/5/2017 14:34
1        MRI SCAN                1/5/2017 21:37
2        X-Ray                   1/7/2017  4:31
2        X-Ray                   1/7/2017  7:57
2        Discuss Results         1/7/2017 14:45
2        Discuss Results         1/7/2017 17:55
2        Check-out               1/9/2017 17:09
2        Check-out               1/9/2017 19:14
3        Registration            1/4/2017  1:34
3        Registration            1/4/2017  6:36
3        Triage and Assessment   1/4/2017 17:49
3        Triage and Assessment   1/5/2017 8:59
3        Blood test              1/5/2017 21:37
3        Blood test              1/6/2017 3:53

【问题讨论】:

  • 你为什么要标记stringdist?你在对handling变量做任何操作吗?
  • 我不知道可以在时间戳上工作的确切功能,所以认为 stringdist 可能会以任何方式,我并不具体。请帮忙。
  • 该列 (handling) 在计算该差异时是否起任何作用?
  • 它没有任何作用,只有第一列和最后一列。

标签: r dplyr


【解决方案1】:

如果time 已经属于POSIXct 类,并且数据帧已经按patienttime 排序,则可以使用SBista's answer 的简化版本附加以分钟为单位的时间差

library(dplyr)
DF %>% 
  group_by(patient) %>% 
  mutate(delta = difftime(time, lag(time, default = first(time)), units = "mins")) 
 # A tibble: 19 x 4
 # Groups:   patient [3]
   patient              handling                time     delta
     <chr>                 <chr>              <dttm>    <time>
 1       1          Registration 2017-01-02 11:41:00    0 mins
 2       1 Triage and Assessment 2017-01-02 12:40:00   59 mins
 3       1          Registration 2017-01-02 12:40:00    0 mins
 4       1 Triage and Assessment 2017-01-02 22:32:00  592 mins
 5       1            Blood test 2017-01-05 08:59:00 3507 mins
 6       1            Blood test 2017-01-05 14:34:00  335 mins
 7       1              MRI SCAN 2017-01-05 21:37:00  423 mins
 8       2                 X-Ray 2017-01-07 04:31:00    0 mins
 9       2                 X-Ray 2017-01-07 07:57:00  206 mins
10       2       Discuss Results 2017-01-07 14:45:00  408 mins
11       2       Discuss Results 2017-01-07 17:55:00  190 mins
12       2             Check-out 2017-01-09 17:09:00 2834 mins
13       2             Check-out 2017-01-09 19:14:00  125 mins
14       3          Registration 2017-01-04 01:34:00    0 mins
15       3          Registration 2017-01-04 06:36:00  302 mins
16       3 Triage and Assessment 2017-01-04 17:49:00  673 mins
17       3 Triage and Assessment 2017-01-05 08:59:00  910 mins
18       3            Blood test 2017-01-05 21:37:00  758 mins
19       3            Blood test 2017-01-06 03:53:00  376 mins

另一种方法是计算所有行的delta,忽略patient 的分组,然后按照OP 的要求将每个patient 的第一个值替换为零。首先忽略这些组可能会带来性能提升(未验证)。

不幸的是,我不够熟练,无法使用dplyr 语法实现这一点,所以我使用data.table 及其引用更新

library(data.table)
setDT(DF)[, delta := difftime(time, shift(time), units = "mins")][]
DF[DF[, first(.I), by = patient]$V1, delta := 0][]
    patient              handling                time     delta
 1:       1          Registration 2017-01-02 11:41:00    0 mins
 2:       1 Triage and Assessment 2017-01-02 12:40:00   59 mins
 3:       1          Registration 2017-01-02 12:40:00    0 mins
 4:       1 Triage and Assessment 2017-01-02 22:32:00  592 mins
 5:       1            Blood test 2017-01-05 08:59:00 3507 mins
 6:       1            Blood test 2017-01-05 14:34:00  335 mins
 7:       1              MRI SCAN 2017-01-05 21:37:00  423 mins
 8:       2                 X-Ray 2017-01-07 04:31:00    0 mins
 9:       2                 X-Ray 2017-01-07 07:57:00  206 mins
10:       2       Discuss Results 2017-01-07 14:45:00  408 mins
11:       2       Discuss Results 2017-01-07 17:55:00  190 mins
12:       2             Check-out 2017-01-09 17:09:00 2834 mins
13:       2             Check-out 2017-01-09 19:14:00  125 mins
14:       3          Registration 2017-01-04 01:34:00    0 mins
15:       3          Registration 2017-01-04 06:36:00  302 mins
16:       3 Triage and Assessment 2017-01-04 17:49:00  673 mins
17:       3 Triage and Assessment 2017-01-05 08:59:00  910 mins
18:       3            Blood test 2017-01-05 21:37:00  758 mins
19:       3            Blood test 2017-01-06 03:53:00  376 mins

【讨论】:

  • 感谢 Uwe 的回复,感谢您的帮助。
【解决方案2】:

您可以执行以下操作:

 data %>%
  group_by(patient) %>%
  mutate(diff_in_sec = as.POSIXct(time, format = "%m/%d/%Y %H:%M") - lag(as.POSIXct(time, format = "%m/%d/%Y %H:%M"), default=first(as.POSIXct(time, format = "%m/%d/%Y %H:%M"))))%>%
  mutate(diff_in_min = as.numeric(diff_in_sec/60))

你得到的输出是:

 # A tibble: 19 x 5
# Groups:   patient [3]
   patient              handling           time diff_in_sec diff_in_min
     <int>                 <chr>          <chr>      <time>       <dbl>
 1       1          Registration 1/2/2017 11:41      0 secs           0
 2       1 Triage and Assessment 1/2/2017 12:40   3540 secs          59
 3       1          Registration 1/2/2017 12:40      0 secs           0
 4       1 Triage and Assessment 1/2/2017 22:32  35520 secs         592
 5       1            Blood test  1/5/2017 8:59 210420 secs        3507
 6       1            Blood test 1/5/2017 14:34  20100 secs         335
 7       1              MRI SCAN 1/5/2017 21:37  25380 secs         423
 8       2                 X-Ray  1/7/2017 4:31      0 secs           0
 9       2                 X-Ray  1/7/2017 7:57  12360 secs         206
10       2       Discuss Results 1/7/2017 14:45  24480 secs         408
11       2       Discuss Results 1/7/2017 17:55  11400 secs         190
12       2             Check-out 1/9/2017 17:09 170040 secs        2834
13       2             Check-out 1/9/2017 19:14   7500 secs         125
14       3          Registration  1/4/2017 1:34      0 secs           0
15       3          Registration  1/4/2017 6:36  18120 secs         302
16       3 Triage and Assessment 1/4/2017 17:49  40380 secs         673
17       3 Triage and Assessment  1/5/2017 8:59  54600 secs         910
18       3            Blood test 1/5/2017 21:37  45480 secs         758
19       3            Blood test  1/6/2017 3:53  22560 secs         376

【讨论】:

  • 这让我想知道,你有没有机会制作 stackoverflow?:) 非常感谢!
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2013-07-12
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2014-04-17
  • 1970-01-01
  • 2022-12-20
相关资源
最近更新 更多