【问题标题】:How to add variable `df1$DateTime_1` to `df2` when `df1$DateTime_1` match within a 5-seconds interval with `df2$DateTime_2`当`df1$DateTime_1`在5秒间隔内与`df2$DateTime_2`匹配时,如何将变量`df1$DateTime_1`添加到`df2`
【发布时间】:2020-07-27 01:43:17
【问题描述】:

我有数据框 df1df2df1 总结了不同的时刻 (df1$Theor.DateTime),理论上,设备向卫星发送信息。我们知道这要归功于变量df1$Delay,它表示从设备到卫星的不同发射之间的秒间隔。 df2 总结了卫星接收此设备信息的具体时间 (df2$Real.DateTime)。正如您在下面的示例中看到的那样,nrow(df2) 小于nrow(df1),因为某些排放由于不同的原因没有到达卫星。您还可以看到 df2$Real.DateTime 由于不同的原因与 df1$Theor.DateTime 不完全匹配。卫星发射和接收信号之间总是存在延迟的。

options("digits.secs" = 3)
df1 <- data.frame(Theor.DateTime= c("2018-03-22 12:00:00.000","2018-03-22 12:00:30.040","2018-03-22 12:01:15.800","2018-03-22 12:02:15.700","2018-03-22 12:02:45.350","2018-03-22 12:03:15.002","2018-03-22 12:04:00.065","2018-03-22 12:05:15.430","2018-03-22 12:06:00.060","2018-03-22 12:06:45.002"),
                  Delay= c(30,45,60,30,30,45,75,45,45,60))
df1$Theor.DateTime <- as.POSIXct(df1$Theor.DateTime, format="%Y-%m-%d %H:%M:%OS",tz="UTC")

head(df1)
           Theor.DateTime Delay
1 2018-03-22 12:00:00.000    30
2 2018-03-22 12:00:30.039    45
3 2018-03-22 12:01:15.799    60
4 2018-03-22 12:02:15.700    30
5 2018-03-22 12:02:45.349    30
6 2018-03-22 12:03:15.002    45


df2 <- data.frame(Real.DateTime= c("2018-03-22 12:00:02.000","2018-03-22 12:02:20.540","2018-03-22 12:02:42.800","2018-03-22 12:05:18.700","2018-03-22 12:06:33.700"))
df2$Theor.DateTime <- as.POSIXct(df1$Theor.DateTime, format="%Y-%m-%d %H:%M:%OS",tz="UTC")

df2
           Real.DateTime
1 2018-03-22 12:00:02.00
2 2018-03-22 12:02:20.53
3 2018-03-22 12:02:42.79
4 2018-03-22 12:05:18.70
5 2018-03-22 12:06:33.70

我想要的是同时创建一个包含df1df2 信息的数据框。当df2$Real.Datetimedf1$Theor.DateTime 的5 秒间隔(+- 5 秒)内时,我想在同一行合并df1$Theor.DateTimedf2$Real.Datetime。我还想创建一个名为Reception.success 的列,指示特定df1$Theor.DateTime 是否与df2$Real.Datetime 匹配(TRUE 或FALSE),表示已收到发射。

我希望:

> df3
            Theor.DateTime Delay Reception.success           Real.DateTime
1  2018-03-22 12:00:00.000    30              TRUE 2018-03-22 12:00:02.000
2  2018-03-22 12:00:30.040    45             FALSE                    <NA>
3  2018-03-22 12:01:15.800    60             FALSE                    <NA>
4  2018-03-22 12:02:15.700    30              TRUE 2018-03-22 12:02:20.540
5  2018-03-22 12:02:45.350    30              TRUE 2018-03-22 12:02:42.800
6  2018-03-22 12:03:15.002    45             FALSE                    <NA>
7  2018-03-22 12:04:00.065    75             FALSE                    <NA>
8  2018-03-22 12:05:15.430    45              TRUE 2018-03-22 12:05:18.700
9  2018-03-22 12:06:00.060    45             FALSE                    <NA>
10 2018-03-22 12:06:45.002    60             FALSE                    <NA>

有人知道怎么弄吗?

提前致谢

【问题讨论】:

    标签: r data.table match tidyverse lubridate


    【解决方案1】:

    可以在data.table中使用Non equi join函数

    library(data.table)
    
    options("digits.secs" = 3)
    df1 <- data.table(Theor.DateTime= as.POSIXct(c("2018-03-22 12:00:00.000","2018-03-22 12:00:30.040","2018-03-22 12:01:15.800","2018-03-22 12:02:15.700","2018-03-22 12:02:45.350","2018-03-22 12:03:15.002","2018-03-22 12:04:00.065","2018-03-22 12:05:15.430","2018-03-22 12:06:00.060","2018-03-22 12:06:45.002"),format="%Y-%m-%d %H:%M:%OS",tz="UTC"),
                      Delay= c(30,45,60,30,30,45,75,45,45,60))
    df2 <- data.table(Real.DateTime= as.POSIXct(c("2018-03-22 12:00:02.000","2018-03-22 12:02:20.540","2018-03-22 12:02:42.800","2018-03-22 12:05:18.700","2018-03-22 12:06:33.700"),format="%Y-%m-%d %H:%M:%OS",tz="UTC"))
    
    
    df2[,`:=`(minus_5=Real.DateTime-5,
              plus_5=Real.DateTime+5)]
    
    
    df2
    #>             Real.DateTime                minus_5                 plus_5
    #> 1: 2018-03-22 12:00:02.00 2018-03-22 11:59:57.00 2018-03-22 12:00:07.00
    #> 2: 2018-03-22 12:02:20.53 2018-03-22 12:02:15.53 2018-03-22 12:02:25.53
    #> 3: 2018-03-22 12:02:42.79 2018-03-22 12:02:37.79 2018-03-22 12:02:47.79
    #> 4: 2018-03-22 12:05:18.70 2018-03-22 12:05:13.70 2018-03-22 12:05:23.70
    #> 5: 2018-03-22 12:06:33.70 2018-03-22 12:06:28.70 2018-03-22 12:06:38.70
    
    
    df1[df2,on = .(Theor.DateTime<=plus_5,Theor.DateTime>=minus_5),"Real.DateTime":=i.Real.DateTime][,"Reception.success":=!is.na(Real.DateTime)]
    
    df1
    #>              Theor.DateTime Delay          Real.DateTime Reception.success
    #>  1: 2018-03-22 12:00:00.000    30 2018-03-22 12:00:02.00              TRUE
    #>  2: 2018-03-22 12:00:30.039    45                   <NA>             FALSE
    #>  3: 2018-03-22 12:01:15.799    60                   <NA>             FALSE
    #>  4: 2018-03-22 12:02:15.700    30 2018-03-22 12:02:20.53              TRUE
    #>  5: 2018-03-22 12:02:45.349    30 2018-03-22 12:02:42.79              TRUE
    #>  6: 2018-03-22 12:03:15.002    45                   <NA>             FALSE
    #>  7: 2018-03-22 12:04:00.065    75                   <NA>             FALSE
    #>  8: 2018-03-22 12:05:15.430    45 2018-03-22 12:05:18.70              TRUE
    #>  9: 2018-03-22 12:06:00.059    45                   <NA>             FALSE
    #> 10: 2018-03-22 12:06:45.002    60                   <NA>             FALSE
    

    reprex package (v0.3.0) 于 2020-04-14 创建

    【讨论】:

    • 谢谢@Frank。只是一个疑问,在两个数据帧中都有数十万行的真实示例中,我是否应该使用setDT 而不是data.table?我不知道他们是否做了不同的事情。
    • @Dekike 在这种情况下,setDT 更好
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-09-26
    • 1970-01-01
    • 1970-01-01
    • 2021-11-04
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多