【问题标题】:Filter time series by hour按小时过滤时间序列
【发布时间】:2020-08-07 22:36:27
【问题描述】:

我有一个数据系列,其中的数据如下所示:

2020-01-02 09:30:00 1 gdss
2020-01-02 10:00:00 2 jojo
2020-01-02 10:30:00 3 hutr 
2020-01-02 11:00:00 2 uff
2020-01-02 11:30:00 4 wwe
2020-01-02 12:00:00 1 vev
2020-01-02 12:30:00 2 wow

它有更多的列,但并不重要。但是,整套确实有十多年的 30 分钟数据。

我想过滤每天的特定时间,但无法正确过滤。我正在使用 lubridate

例如要得到这个区间:

2020-01-02 10:30:00 3 hutr 
2020-01-02 11:00:00 2 uff
2020-01-02 11:30:00 4 wwe
2020-01-02 12:00:00 1 vev

我尝试了以下方法:

with(load_dataset, load_dataset[ (hour(load_dataset$Date) == 10 & minute(load_dataset$Date) == 30) | (hour(load_dataset$Date) == 12 & minute(load_dataset$Date) < 30), ])

这仅给出第一个和最后一个。

with(load_dataset, load_dataset[(hour(load_dataset$Date) == 10 & minute(load_dataset$Date) == 30) & (hour(load_dataset$Date) == 12 & minute(load_dataset$Date) < 30), ])

这给出零行。

with(load_dataset, load_dataset[(hour(load_dataset$Date) >= 10 & minute(load_dataset$Date) == 30) & (hour(load_dataset$Date) <= 12 & minute(load_dataset$Date) <= 30), ])

这仅给出 30 分钟的间隔:

2020-01-02 10:30:00 3 hutr
2020-01-02 11:30:00 4 wwe

如何过滤数据集中每天 10:30 到 12:00(包括 12:00)之间的每一行?

【问题讨论】:

    标签: r filter time-series lubridate


    【解决方案1】:

    您可以强制时间到"numeric" 并查看它是否在1030:1200 内。

    load_dataset[as.numeric(strftime(load_dataset$date, "%H%M")) %in% 1030:1200, ]
    #                  date V3   V4
    # 3 2020-01-02 10:30:00  3 hutr
    # 4 2020-01-02 11:00:00  2  uff
    # 5 2020-01-02 11:30:00  4  wwe
    # 6 2020-01-02 12:00:00  1  vev
    

    注意:此解决方案假定您的 date 列采用 "POSIXct" 格式;如果还没有,请在此之前使用:

    load_dataset$date <- as.POSIXct(load_dataset$date)
    

    此原则也适用于“实时”时间序列对象,例如 "xts"

    load_dataset.xts[
      as.numeric(strftime(as.POSIXct(attr(load_dataset.xts, "index"), 
                                     origin="1970-01-01"), "%H%M")) %in% 1030:1200, ]
    #                     V3  V4    
    # 2020-01-02 10:30:00 "3" "hutr"
    # 2020-01-02 11:00:00 "2" "uff" 
    # 2020-01-02 11:30:00 "4" "wwe" 
    # 2020-01-02 12:00:00 "1" "vev" 
    

    数据:

    load_dataset <- structure(list(date = structure(c(1577953800, 1577955600, 1577957400, 
    1577959200, 1577961000, 1577962800, 1577964600), class = c("POSIXct", 
    "POSIXt"), tzone = ""), V3 = c(1L, 2L, 3L, 2L, 4L, 1L, 2L), V4 = c("gdss", 
    "jojo", "hutr", "uff", "wwe", "vev", "wow")), row.names = c(NA, 
    -7L), class = "data.frame")
    
    load_dataset.xts <- structure(c("1", "2", "3", "2", "4", "1", "2", "gdss", "jojo", 
    "hutr", "uff", "wwe", "vev", "wow"), .Dim = c(7L, 2L), .Dimnames = list(
        NULL, c("V3", "V4")), index = structure(c(1577953800, 1577955600, 
    1577957400, 1577959200, 1577961000, 1577962800, 1577964600), tzone = "", tclass = c("POSIXct", 
    "POSIXt")), class = c("xts", "zoo"))
    

    【讨论】:

    • 谢谢。这似乎适用于一个小例外。我得到了从 0930 到 1100 的行。这可能是因为我距离 GMT 有一个小时的路程,而 R 假设数据集在 GMT 时间标准化?
    • @Terjeja 你可以试试strftimetz= 参数或将+100 添加到数字中。
    【解决方案2】:

    我认为你想要做的是:

    subset(transform(df, hour = as.integer(format(datetime, "%H")), 
                         minute = as.integer(format(datetime, "%M"))), 
          (hour == 10 & minute >= 30) | hour == 11 | hour == 12 & minute == 0)
    
    
    #  V3   V4            datetime hour minute
    #3  3 hutr 2020-01-02 10:30:00   10     30
    #4  2  uff 2020-01-02 11:00:00   11      0
    #5  4  wwe 2020-01-02 11:30:00   11     30
    #6  1  vev 2020-01-02 12:00:00   12      0
    

    使用dplyrlubridate 可以这样做:

    library(dplyr)
    library(lubridate)
    
    df %>%
      mutate(hour = hour(datetime), minute = minute(datetime)) %>%
      filter((hour == 10 & minute >= 30) | hour == 11 | hour == 12 & minute == 0)
    

    数据

    df <-  structure(list(V3 = c(1L, 2L, 3L, 2L, 4L, 1L, 2L), V4 = structure(c(1L, 
    3L, 2L, 4L, 7L, 5L, 6L), .Label = c("gdss", "hutr", "jojo", "uff", 
    "vev", "wow", "wwe"), class = "factor"), datetime = structure(c(1577957400, 
    1577959200, 1577961000, 1577962800, 1577964600, 1577966400, 1577968200
    ), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA, 
    -7L), class = "data.frame")
    

    【讨论】:

      猜你喜欢
      • 2021-01-17
      • 2013-10-27
      • 1970-01-01
      • 2020-11-02
      • 1970-01-01
      • 2016-04-23
      • 2018-02-28
      • 2010-12-27
      • 2020-11-09
      相关资源
      最近更新 更多