【问题标题】:R: Subset a data frame based on times that are within a certain number of minutes of an observation windowR:根据观察窗口一定分钟数内的时间对数据帧进行子集
【发布时间】:2013-08-08 03:53:41
【问题描述】:

假设我有一个包含开始和结束时间列、一个测量列和一个测量时间列的数据框,如下所示:

     start         end    value                time
   9:01:00     9:02:00     30.6  2013-03-25 9:05:00
   9:01:00     9:02:00     30.8  2013-03-25 9:15:00
   9:46:00     9:46:00     28.2  2013-03-25 9:43:00
   9:46:00     9:46:00     28.9  2013-03-25 9:53:00
  10:54:00    10:59:00     13.4 2013-03-25 10:56:00
  10:54:00    10:59:00     13.8 2013-03-25 11:56:00

如何子集此数据框以仅包含时间列在开始和结束时间或开始时间前十分钟和结束时间后十分钟内的行。我随意选择了十分钟,想知道如何在开始和结束时间之前和之后的任意时间内执行此操作。

生成的数据框如下:

     start         end    value                time
   9:01:00     9:02:00     30.6  2013-03-25 9:05:00
   9:46:00     9:46:00     28.2  2013-03-25 9:43:00
   9:46:00     9:46:00     28.9  2013-03-25 9:53:00
  10:54:00    10:59:00     13.4 2013-03-25 10:56:00

除了从开始/结束列条目中减去/添加 x 分钟数,然后根据时间列是否位于这些扩展窗口之间进行子集之外,还有其他方法吗?

目前,我已将时间列转换为 POSIXlt 格式。不幸的是,这在开始和结束列中给出了今天的日期。

这是第一个数据帧的输入:

structure(list(start = structure(list(sec = c(0, 0, 0, 0, 0, 
0), min = c(1L, 1L, 46L, 46L, 54L, 54L), hour = c(9L, 9L, 9L, 
9L, 10L, 10L), mday = c(7L, 7L, 7L, 7L, 7L, 7L), mon = c(7L, 
7L, 7L, 7L, 7L, 7L), year = c(113L, 113L, 113L, 113L, 113L, 113L
), wday = c(3L, 3L, 3L, 3L, 3L, 3L), yday = c(218L, 218L, 218L, 
218L, 218L, 218L), isdst = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("sec", 
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
), class = c("POSIXlt", "POSIXt")), end = structure(list(sec = c(0, 
0, 0, 0, 0, 0), min = c(2L, 2L, 46L, 46L, 59L, 59L), hour = c(9L, 
9L, 9L, 9L, 10L, 10L), mday = c(7L, 7L, 7L, 7L, 7L, 7L), mon = c(7L, 
7L, 7L, 7L, 7L, 7L), year = c(113L, 113L, 113L, 113L, 113L, 113L
), wday = c(3L, 3L, 3L, 3L, 3L, 3L), yday = c(218L, 218L, 218L, 
218L, 218L, 218L), isdst = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("sec", 
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
), class = c("POSIXlt", "POSIXt")), value = c(30.6, 30.8, 28.2, 
28.9, 13.4, 13.8), time = structure(list(sec = c(0, 0, 0, 0, 
0, 0), min = c(5L, 15L, 43L, 53L, 56L, 56L), hour = c(9L, 9L, 
9L, 9L, 10L, 11L), mday = c(25L, 25L, 25L, 25L, 25L, 25L), mon = c(2L, 
2L, 2L, 2L, 2L, 2L), year = c(113L, 113L, 113L, 113L, 113L, 113L
), wday = c(1L, 1L, 1L, 1L, 1L, 1L), yday = c(83L, 83L, 83L, 
83L, 83L, 83L), isdst = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("sec", 
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
), class = c("POSIXlt", "POSIXt"))), .Names = c("start", "end", 
"value", "time"), row.names = c(NA, -6L), class = "data.frame")

这是第二个数据框的输入

structure(list(start = structure(list(sec = c(0, 0, 0, 0), min = c(1L, 
46L, 46L, 54L), hour = c(9L, 9L, 9L, 10L), mday = c(7L, 7L, 7L, 
7L), mon = c(7L, 7L, 7L, 7L), year = c(113L, 113L, 113L, 113L
), wday = c(3L, 3L, 3L, 3L), yday = c(218L, 218L, 218L, 218L), 
    isdst = c(1L, 1L, 1L, 1L)), .Names = c("sec", "min", "hour", 
"mday", "mon", "year", "wday", "yday", "isdst"), class = c("POSIXlt", 
"POSIXt")), end = structure(list(sec = c(0, 0, 0, 0), min = c(2L, 
46L, 46L, 59L), hour = c(9L, 9L, 9L, 10L), mday = c(7L, 7L, 7L, 
7L), mon = c(7L, 7L, 7L, 7L), year = c(113L, 113L, 113L, 113L
), wday = c(3L, 3L, 3L, 3L), yday = c(218L, 218L, 218L, 218L), 
    isdst = c(1L, 1L, 1L, 1L)), .Names = c("sec", "min", "hour", 
"mday", "mon", "year", "wday", "yday", "isdst"), class = c("POSIXlt", 
"POSIXt")), value = c(30.6, 28.2, 28.9, 13.4), time = structure(list(
    sec = c(0, 0, 0, 0), min = c(5L, 43L, 53L, 56L), hour = c(9L, 
    9L, 9L, 10L), mday = c(25L, 25L, 25L, 25L), mon = c(2L, 2L, 
    2L, 2L), year = c(113L, 113L, 113L, 113L), wday = c(1L, 1L, 
    1L, 1L), yday = c(83L, 83L, 83L, 83L), isdst = c(1L, 1L, 
    1L, 1L)), .Names = c("sec", "min", "hour", "mday", "mon", 
"year", "wday", "yday", "isdst"), class = c("POSIXlt", "POSIXt"
))), .Names = c("start", "end", "value", "time"), row.names = c(NA, 
-4L), class = "data.frame")

【问题讨论】:

  • 当询问 re: 时间/日期数据时,dput(datasetname) 非常有用,这样回答者就不需要重新创建所有数据。这是正确的痛苦。

标签: r date time subset


【解决方案1】:

重新创建没有乐趣,但答案应该很简单:

data[with(data, time > start - 10*60 & time < end + 10*60),]

假设 startendtime 对象实际上都是可比较的(即对应的年份和日期) - 否则只需将对应于一天中的时间的子字符串转换为 POSIX。

更新:好的,由于您的日期已关闭,您需要重新创建它们以“同步”,例如:

data$start <- as.POSIXct(substr(data$start,12,19), format="%H:%M:%S")
data$end <- as.POSIXct(substr(data$end,12,19), format="%H:%M:%S")
data$time <- as.POSIXct(substr(data$time,12,19), format="%H:%M:%S")

现在,上面的行给出了你想要的。可能,您应该小心如何从原始数据中编码 POSIX。此外,对于大多数应用程序,POSIXct 可能比 POSIXlt 更受欢迎——其中每个元素都是一个列表。这可能会阻碍(或减慢)稍后的某些操作。

【讨论】:

    【解决方案2】:

    以@EliGurarie 的回答为基础:

    #dat <- ....see original question
    

    将时间转换为POSIX 表示并计算:

    datestem <- as.character(as.Date(dat$time))
    dat$start <- as.POSIXct(paste(datestem,format(dat$start,"%H:%M:%S")))
    dat$end <- as.POSIXct(paste(datestem,format(dat$end,"%H:%M:%S")))
    
    dat[
         with(
          dat,
          difftime(start,time,units="mins") > -10 &
          difftime(end,time,units="mins") < 10
         ),
       ]
    

    或者,使用一些舍入和一些中间变量:

    min10 <- 10/(60*24)
    ds <- difftime(dat$start,dat$time,units="days")
    ds <- dd - round(dd) 
    de <- difftime(dat$end,dat$time,units="days")
    de <- de - round(de) 
    
    dat[ds > -min10 & de < min10,]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-12-23
      • 2021-06-08
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-06-29
      • 1970-01-01
      相关资源
      最近更新 更多