【问题标题】:R - merge 2 dataframes with timestamp betweenR - 合并2个数据帧与时间戳之间
【发布时间】:2018-12-09 17:16:07
【问题描述】:

DF1:

structure(list(X = c(113.8577674, 113.8577537, 113.8577403), 
    Y = c(22.19537297, 22.19537222, 22.1953723), Date = c("7/1/2016", 
    "7/1/2016", "7/1/2016"), Time = structure(c(9474, 9484, 9494
    ), class = c("hms", "difftime"), units = "secs"), TrackTime = structure(c(38274, 
    38284, 38294), class = c("hms", "difftime"), units = "secs")), .Names = c("X", 
"Y", "Date", "Time", "TrackTime"), row.names = c(NA, -3L), class = "data.frame", spec = structure(list(
    cols = structure(list(X = structure(list(), class = c("collector_double", 
    "collector")), Y = structure(list(), class = c("collector_double", 
    "collector")), Date = structure(list(), class = c("collector_character", 
    "collector")), Time = structure(list(format = ""), .Names = "format", class = c("collector_time", 
    "collector")), TrackTime = structure(list(format = ""), .Names = "format", class = c("collector_time", 
    "collector"))), .Names = c("X", "Y", "Date", "Time", "TrackTime"
    )), default = structure(list(), class = c("collector_guess", 
    "collector"))), .Names = c("cols", "default"), class = "col_spec"))

DF2:

structure(list(DATE_SIGHT = "7-Jan-16", GROUP_SIGHT = 1L, TIME_START = structure(38280, class = c("hms", 
"difftime"), units = "secs"), TIME_END = structure(39060, class = c("hms", 
"difftime"), units = "secs"), GRP_SIZE = 1L), .Names = c("DATE_SIGHT", 
"GROUP_SIGHT", "TIME_START", "TIME_END", "GRP_SIZE"), row.names = c(NA, 
-1L), class = "data.frame", spec = structure(list(cols = structure(list(
    DATE_SIGHT = structure(list(), class = c("collector_character", 
    "collector")), GROUP_SIGHT = structure(list(), class = c("collector_integer", 
    "collector")), TIME_START = structure(list(format = ""), .Names = "format", class = c("collector_time", 
    "collector")), TIME_END = structure(list(format = ""), .Names = "format", class = c("collector_time", 
    "collector")), GRP_SIZE = structure(list(), class = c("collector_integer", 
    "collector"))), .Names = c("DATE_SIGHT", "GROUP_SIGHT", "TIME_START", 
"TIME_END", "GRP_SIZE")), default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"))

DF1 看起来像这样:

         X        Y     Date     Time TrackTime
1 113.8578 22.19537 7/1/2016 02:37:54  10:37:54
2 113.8578 22.19537 7/1/2016 02:38:04  10:38:04
3 113.8577 22.19537 7/1/2016 02:38:14  10:38:14

DF2 看起来像这样:

  DATE_SIGHT GROUP_SIGHT TIME_START TIME_END GRP_SIZE
1   7-Jan-16           1   10:38:00 10:51:00        1

如果 DF1 的行 (TrackTime) 在 DF2 的 TIME_START 和 TIME_END 内,我想匹配。然后只保留其中的那些,并将另一列关联到 DF1。 所以最终的产品应该是这样的:

         X        Y     Date     Time TrackTime GROUP_SIGHT GRP_SIZE
1 113.8578 22.19537 7/1/2016 02:38:04  10:38:04           1        1
2 113.8577 22.19537 7/1/2016 02:38:14  10:38:14           1        1

【问题讨论】:

    标签: r tidyverse lubridate


    【解决方案1】:

    data.tablefoverlaps() 是我在这里选择的首选武器。但首先,您必须创建一些正确的 (POSIXct) 时间戳才能加入..

    library( data.table )
    #create two data.tables
    dt1 <- as.data.table( DF1 )
    dt2 <- as.data.table( DF2 )
    #add suffixes to columns, to identify them after the join
    names( dt1 ) <- paste0( names( dt1 ), ".dt1" )
    names( dt2 ) <- paste0( names( dt2 ), ".dt2" )
    #save column order for later
    colorder <- c( names( dt1 ), names( dt2 ) )
    #set date-time as poosix-stamps
    dt1[, `:=`( start.join = as.POSIXct( paste0( Date.dt1, TrackTime.dt1 ), format = "%d/%m/%Y%H:%M:%S" ),
                end.join = as.POSIXct( paste0( Date.dt1, TrackTime.dt1 ), format = "%d/%m/%Y%H:%M:%S" ) )]
    dt2[, `:=`( start.join = as.POSIXct( paste0( DATE_SIGHT.dt2, TIME_START.dt2 ), format = "%d-%b-%y%H:%M:%S" ),
                end.join = as.POSIXct( paste0( DATE_SIGHT.dt2, TIME_END.dt2 ), format = "%d-%b-%y%H:%M:%S" ) )]
    #set key on dt2
    setkey( dt2, start.join, end.join )
    #perform the overlap join
    result <- foverlaps( dt1, dt2, type = "within", nomatch = 0L )
    #drop the join columns
    result[, grep( ".join$", names( result ) ) := NULL]
    #set the order of columns right
    setcolorder( result, colorder )
    
    #       X.dt1    Y.dt1 Date.dt1 Time.dt1 TrackTime.dt1 DATE_SIGHT.dt2 GROUP_SIGHT.dt2 TIME_START.dt2 TIME_END.dt2 GRP_SIZE.dt2
    # 1: 113.8578 22.19537 7/1/2016 02:38:04      10:38:04       7-Jan-16               1       10:38:00     10:51:00            1
    # 2: 113.8577 22.19537 7/1/2016 02:38:14      10:38:14       7-Jan-16               1       10:38:00     10:51:00            1
    

    【讨论】:

    • > 结果
    • 我试过 na.omit 但又出现了另一个错误:> 结果
    • 在您提供的示例数据上完美运行...您是否运行了所有代码行?
    猜你喜欢
    • 1970-01-01
    • 2019-06-22
    • 1970-01-01
    • 2019-04-28
    • 1970-01-01
    • 2019-10-31
    • 1970-01-01
    • 2021-06-22
    相关资源
    最近更新 更多