【问题标题】:Merge dataframes when timestamp of one is between another one's datetime intervals当一个时间戳在另一个日期时间间隔之间时合并数据帧
【发布时间】:2019-10-31 03:16:55
【问题描述】:

我有两个数据帧,其中包含 POSIXct 格式的时间数据和一个我需要匹配的相应位置。一个数据集包含一系列 30 分钟的时间段以及位置数据。

location   datetimes        date       shark
SS04   2018-03-20 08:00:00 2018-03-20     A
Absent 2018-03-20 08:30:00 2018-03-20     A
Absent 2018-03-20 09:00:00 2018-03-20     A
Absent 2018-03-20 09:30:00 2018-03-20     A
SS04   2018-03-20 10:00:00 2018-03-20     A
Absent 2018-03-20 10:30:00 2018-03-20     A

第二个数据集每 2 分钟记录一次时间数据。

shark       depth     temperature   datetime       date
A            49.5        26.2   20/03/2018 08:00 20/03/2018
A            49.5        25.3   20/03/2018 08:02 20/03/2018
A            53.0        24.2   20/03/2018 08:04 20/03/2018
A            39.5        26.5   20/03/2018 08:28 20/03/2018
A            43.0        26.2   20/03/2018 09:10 20/03/2018
A            44.5        26.5   20/03/2018 10:34 20/03/2018

我需要根据位置数据将第一个数据集中的时间bin(datetimes)与第二个数据集中的时间数据(datetime)进行匹配,这样第二个数据集中的所有数据都对应于第一个数据集中的一个bin数据集的位置值分配给 30 分钟内的所有值。

我想我可以使用 data.table,但我对如何处理这个问题没有信心。

理想情况下,我希望创建一个这样的数据集,根据第一个数据集中的相应时间段,将第一个数据集中的位置添加到第二个数据集中。

shark depth temperature   datetime    date      location
A     49.5  26.2   20/03/2018 08:00 20/03/2018    SS04
A     49.5  25.3   20/03/2018 08:02 20/03/2018    SS04
A     53.0  24.2   20/03/2018 08:04 20/03/2018    SS04
A     39.5  26.5   20/03/2018 08:32 20/03/2018    Absent
A     43.0  26.2   20/03/2018 09:10 20/03/2018    Absent
A     44.5  26.5   20/03/2018 10:18 20/03/2018    SS04

【问题讨论】:

标签: r data.table lubridate


【解决方案1】:

使用 data.table 非等连接

样本数据

library( data.table)

DT1 <- fread('
location   datetimes        date       shark
SS04   "2018-03-20 08:00:00" 2018-03-20     A
Absent "2018-03-20 08:30:00" 2018-03-20     A
Absent "2018-03-20 09:00:00" 2018-03-20     A
Absent "2018-03-20 09:30:00" 2018-03-20     A
SS04   "2018-03-20 10:00:00" 2018-03-20     A
Absent "2018-03-20 10:30:00" 2018-03-20     A')

DT2 <- fread('
shark       depth     temperature   datetime       date
A            49.5        26.2   "20/03/2018 08:00" 20/03/2018
A            49.5        25.3   "20/03/2018 08:02" 20/03/2018
A            53.0        24.2   "20/03/2018 08:04" 20/03/2018
A            39.5        26.5   "20/03/2018 08:28" 20/03/2018
A            43.0        26.2   "20/03/2018 09:10" 20/03/2018
A            44.5        26.5   "20/03/2018 10:34" 20/03/2018
')

DT1[, `:=`( datetimes = as.POSIXct( datetimes, format = "%Y-%m-%d %H:%M:%S" ))]
DT2[, `:=`( datetime = as.POSIXct( datetime, format = "%d/%m/%Y %H:%M" ) )]

代码

DT2[ copy(DT1)[, end := datetimes + lubridate::minutes(30)], location := i.location, 
     on = .( datetime >= datetimes, datetime < end)][]

输出

#    shark depth temperature            datetime       date location
# 1:     A  49.5        26.2 2018-03-20 08:00:00 20/03/2018     SS04
# 2:     A  49.5        25.3 2018-03-20 08:02:00 20/03/2018     SS04
# 3:     A  53.0        24.2 2018-03-20 08:04:00 20/03/2018     SS04
# 4:     A  39.5        26.5 2018-03-20 08:28:00 20/03/2018     SS04
# 5:     A  43.0        26.2 2018-03-20 09:10:00 20/03/2018   Absent
# 6:     A  44.5        26.5 2018-03-20 10:34:00 20/03/2018   Absent

【讨论】:

    【解决方案2】:
    data30min$datetimesE <- data30min$datetimes + 30 * 60 #in_seconds
    
    library(sqldf)
    
    sqldf('select d2.*,d30.location
               from data2min d2
               left join data30min d30
                 on d2.datetime between d30.datetimes and d30.datetimesE
          ')
    
    #>   shark depth temperature            datetime       date location
    #> 1     A  49.5        26.2 2018-03-20 08:00:00 20/03/2018     SS04
    #> 2     A  49.5        25.3 2018-03-20 08:02:00 20/03/2018     SS04
    #> 3     A  53.0        24.2 2018-03-20 08:04:00 20/03/2018     SS04
    #> 4     A  39.5        26.5 2018-03-20 08:28:00 20/03/2018     SS04
    #> 5     A  43.0        26.2 2018-03-20 09:10:00 20/03/2018   Absent
    #> 6     A  44.5        26.5 2018-03-20 10:34:00 20/03/2018   Absent
    

    数据:

    data2min <- structure(list(shark = c("A", "A", "A", "A", "A", "A"), depth = c(49.5, 
    49.5, 53, 39.5, 43, 44.5), temperature = c(26.2, 25.3, 24.2, 
    26.5, 26.2, 26.5), datetime = structure(c(1521547200, 1521547320, 
    1521547440, 1521548880, 1521551400, 1521556440), class = c("POSIXct", 
    "POSIXt"), tzone = ""), date = c("20/03/2018", "20/03/2018", 
    "20/03/2018", "20/03/2018", "20/03/2018", "20/03/2018")), row.names = c(NA, 
    -6L), class = "data.frame")
    
    data30min <- structure(list(location = c("SS04", "Absent", "Absent", "Absent", 
    "SS04", "Absent"), datetimes = structure(c(1521547200, 1521549000, 
    1521550800, 1521552600, 1521554400, 1521556200), class = c("POSIXct", 
    "POSIXt"), tzone = ""), date = c("2018-03-20", "2018-03-20", 
    "2018-03-20", "2018-03-20", "2018-03-20", "2018-03-20"), shark = c("A", 
    "A", "A", "A", "A", "A"), datetimesE = structure(c(1521549000, 
    1521550800, 1521552600, 1521554400, 1521556200, 1521558000), class = c("POSIXct", 
    "POSIXt"))), row.names = c(NA, -6L), class = "data.frame")
    

    【讨论】:

      猜你喜欢
      • 2016-07-17
      • 2019-04-28
      • 1970-01-01
      • 2021-08-21
      • 1970-01-01
      • 2019-06-22
      • 2020-04-20
      • 2021-01-09
      相关资源
      最近更新 更多