【问题标题】:dplyr Time Diff between rowsdplyr 行之间的时间差异
【发布时间】:2017-08-15 14:06:47
【问题描述】:

我有一个以下格式的数据框,我试图找出事件“已分配”与事件“已创建”之前的最后一次时间之间的差异。

**AccountID**              **TIME**                    **EVENT**
1                      2016-11-08T01:54:15.000Z        CREATED
1                      2016-11-09T01:54:15.000Z        ASSIGNED
1                      2016-11-10T01:54:15.000Z        CREATED
1                      2016-11-11T01:54:15.000Z        CALLED
1                      2016-11-12T01:54:15.000Z        ASSIGNED
1                      2016-11-12T01:54:15.000Z        SLEEP

目前我的代码如下,我的困难是选择刚刚分配事件之前的创建

test <- timetable.filter %>%
  group_by(AccountID) %>%
  mutate(timeToAssign = ifelse(EVENT == 'ASSIGNED', 
                                interval(ymd_hms(TIME), max(ymd_hms(TIME[EVENT == 'CREATED']))) %/% hours(1), NA))

我正在寻找输出是

**AccountID**              **TIME**                    **EVENT**        **timeToAssign**
1                      2016-11-08T01:54:15.000Z        CREATED         NA
1                      2016-11-09T01:54:15.000Z        ASSIGNED         12
1                      2016-11-10T01:54:15.000Z        CREATED         NA
1                      2016-11-11T01:54:15.000Z        CALLED         NA
1                      2016-11-12T01:54:15.000Z        ASSIGNED         24
1                      2016-11-12T01:54:15.000Z        SLEEP         NA

【问题讨论】:

  • 预期输出是什么?
  • 你能显示你想要的输出吗?
  • timeToAssign中的单位是什么?
  • timetable.filter %&gt;% group_by(AccountID, cumsum(EVENT == "CREATED")) %&gt;% mutate(timeToAssign = ifelse(EVENT == 'ASSIGNED', TIME - first(TIME), NA))。这应该让你开始。
  • 不应该分别是24小时和48小时吗?

标签: r dplyr lubridate


【解决方案1】:

使用dplyrtidyr

library(dplyr); library(tidyr); library(anytime)

df %>% 
    group_by(AccountID) %>% 
    mutate(CREATED_INDEX = if_else(EVENT == 'CREATED', row_number(), NA_integer_),
           TIME = anytime(TIME)) %>% 
    fill(CREATED_INDEX) %>% 
    mutate(TimeToAssign = if_else(EVENT == 'ASSIGNED', 
                                  as.numeric(TIME - TIME[CREATED_INDEX], units = 'hours'), 
                                  NA_real_)) %>% 
    select(-CREATED_INDEX)

# A tibble: 6 x 4
# Groups:   AccountID [1]
#  AccountID                TIME    EVENT TimeToAssign
#      <int>              <dttm>   <fctr>        <dbl>
#1         1 2016-11-08 01:54:15  CREATED           NA
#2         1 2016-11-09 01:54:15 ASSIGNED           24
#3         1 2016-11-10 01:54:15  CREATED           NA
#4         1 2016-11-11 01:54:15   CALLED           NA
#5         1 2016-11-12 01:54:15 ASSIGNED           48
#6         1 2016-11-12 01:54:15    SLEEP           NA

【讨论】:

    猜你喜欢
    • 2014-11-05
    • 2014-06-30
    • 2018-06-21
    • 1970-01-01
    • 1970-01-01
    • 2013-07-12
    • 2021-10-19
    • 1970-01-01
    相关资源
    最近更新 更多