【问题标题】:Filling holes in intraday time series填补日内时间序列中的漏洞
【发布时间】:2020-10-26 11:02:44
【问题描述】:

我有这个时间序列(1 分钟的时间范围)

structure(list(V1 = c("01/04/2007", "01/04/2007", "01/04/2007", 
"01/04/2007", "01/04/2007", "01/04/2007", "01/04/2007", "01/04/2007", 
"01/04/2007", "01/04/2007", "01/04/2007", "01/04/2007", "01/04/2007", 
"01/04/2007", "01/04/2007", "01/04/2007", "01/04/2007", "01/04/2007", 
"01/04/2007", "01/04/2007", "01/04/2007", "01/04/2007", "01/04/2007", 
"01/04/2007", "01/04/2007", "01/04/2007", "02/04/2007", "02/04/2007", 
"02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", 
"02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", 
"02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", 
"02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", 
"02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", 
"02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", 
"02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", 
"02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", 
"02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", 
"02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", 
"02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", 
"02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", 
"02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", 
"02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", "02/04/2007", 
"02/04/2007", "02/04/2007"), V2 = c("23:01", "23:03", "23:04", 
"23:05", "23:06", "23:07", "23:08", "23:09", "23:14", "23:15", 
"23:17", "23:19", "23:20", "23:25", "23:26", "23:28", "23:29", 
"23:31", "23:32", "23:34", "23:39", "23:43", "23:45", "23:46", 
"23:55", "23:56", "00:02", "00:03", "00:06", "00:09", "00:13", 
"00:15", "00:16", "00:17", "00:18", "00:20", "00:22", "00:23", 
"00:33", "00:41", "00:42", "00:43", "00:47", "00:48", "00:50", 
"00:51", "00:55", "00:56", "00:59", "01:00", "01:01", "01:02", 
"01:04", "01:05", "01:07", "01:09", "01:11", "01:12", "01:18", 
"01:19", "01:20", "01:21", "01:22", "01:26", "01:27", "01:28", 
"01:30", "01:32", "01:35", "01:40", "01:41", "01:44", "01:46", 
"01:47", "01:51", "02:07", "02:09", "02:11", "02:13", "02:15", 
"02:21", "02:22", "02:23", "02:24", "02:28", "02:30", "02:32", 
"02:39", "02:45", "03:14", "03:17", "03:22", "03:28", "03:32", 
"04:21", "04:28", "04:34", "04:39", "04:45", "04:47"), V3 = c(1791, 
1790.5, 1790.25, 1789.5, 1790, 1790.5, 1790.25, 1790, 1789.75, 
1789.25, 1789.25, 1788.75, 1789, 1789.25, 1789.25, 1789.5, 1790.25, 
1790.75, 1791, 1791.5, 1791.25, 1791.25, 1790.75, 1791.5, 1791, 
1790.75, 1790, 1790, 1789.75, 1789.75, 1789.5, 1789.75, 1790, 
1790.5, 1790.75, 1791, 1791, 1791, 1790.5, 1790.5, 1790.5, 1791, 
1791, 1791.25, 1791.25, 1791.25, 1791.25, 1791.25, 1791.25, 1791.25, 
1791, 1792, 1792, 1792, 1792.5, 1792.75, 1793, 1793.25, 1793, 
1793, 1793.25, 1793.75, 1793.75, 1793.5, 1793.5, 1793, 1793.25, 
1793.5, 1793.5, 1792.75, 1793.25, 1793, 1793, 1792.5, 1793.25, 
1793.5, 1792.75, 1793, 1792.75, 1793, 1792.5, 1792.5, 1793, 1793, 
1792.75, 1793.25, 1792.25, 1792.5, 1792.75, 1793, 1792.75, 1792.5, 
1792.75, 1793, 1793.25, 1793.5, 1793.5, 1793.25, 1793.25, 1793
), V4 = c(1791, 1790.5, 1790.25, 1790.25, 1790.5, 1790.5, 1790.25, 
1790, 1789.75, 1789.25, 1789.25, 1788.75, 1789, 1789.25, 1789.5, 
1790, 1790.25, 1791, 1791.25, 1792, 1791.25, 1791.25, 1790.75, 
1791.5, 1791.25, 1790.75, 1790.25, 1790, 1789.75, 1789.75, 1789.75, 
1789.75, 1790.25, 1790.5, 1791.25, 1791, 1791, 1791, 1790.5, 
1790.5, 1790.5, 1791, 1791, 1791.25, 1791.25, 1791.25, 1791.25, 
1791.25, 1791.75, 1791.25, 1792.5, 1792, 1792, 1792, 1793, 1792.75, 
1793.25, 1793.25, 1793, 1793.25, 1793.75, 1794, 1793.75, 1793.5, 
1793.5, 1793, 1793.5, 1793.5, 1793.5, 1792.75, 1793.25, 1793, 
1793, 1792.5, 1793.75, 1793.5, 1792.75, 1793, 1792.75, 1793, 
1792.5, 1792.5, 1793, 1793, 1792.75, 1793.75, 1792.25, 1792.5, 
1792.75, 1793, 1792.75, 1792.5, 1792.75, 1793.25, 1793.25, 1793.5, 
1793.5, 1793.25, 1793.25, 1793), V5 = c(1790.75, 1789.75, 1790.25, 
1789.5, 1790, 1790.5, 1790.25, 1790, 1789.75, 1788.75, 1789, 
1788.75, 1788.75, 1789.25, 1789.25, 1789.5, 1790.25, 1790.75, 
1791, 1791.5, 1791.25, 1791, 1790.75, 1791.5, 1791, 1790.5, 1790, 
1790, 1789.75, 1789.25, 1789.5, 1789.75, 1790, 1790.5, 1790.75, 
1791, 1791, 1791, 1790, 1790.5, 1790.5, 1791, 1791, 1791.25, 
1791.25, 1791.25, 1791.25, 1791.25, 1791.25, 1791.25, 1791, 1792, 
1792, 1792, 1792.25, 1792.75, 1792.75, 1793.25, 1793, 1793, 1793.25, 
1793.75, 1793.75, 1793.5, 1793.5, 1793, 1793.25, 1793.5, 1793.5, 
1792.5, 1793.25, 1793, 1793, 1792.5, 1793.25, 1793, 1792.75, 
1793, 1792.75, 1793, 1792.5, 1792.25, 1793, 1793, 1792.75, 1793.25, 
1792.25, 1792.5, 1792.75, 1793, 1792.75, 1792.25, 1792.75, 1793, 
1793.25, 1793.5, 1793.5, 1793.25, 1793.25, 1793), V6 = c(1790.75, 
1789.75, 1790.25, 1790.25, 1790.5, 1790.5, 1790.25, 1790, 1789.75, 
1788.75, 1789, 1788.75, 1788.75, 1789.25, 1789.5, 1790, 1790.25, 
1791, 1791.25, 1792, 1791.25, 1791, 1790.75, 1791.5, 1791.25, 
1790.5, 1790.25, 1790, 1789.75, 1789.25, 1789.75, 1789.75, 1790.25, 
1790.5, 1791.25, 1791, 1791, 1791, 1790, 1790.5, 1790.5, 1791, 
1791, 1791.25, 1791.25, 1791.25, 1791.25, 1791.25, 1791.75, 1791.25, 
1792.5, 1792, 1792, 1792, 1792.5, 1792.75, 1792.75, 1793.25, 
1793, 1793.25, 1793.75, 1794, 1793.75, 1793.5, 1793.5, 1793, 
1793.5, 1793.5, 1793.5, 1792.5, 1793.25, 1793, 1793, 1792.5, 
1793.75, 1793, 1792.75, 1793, 1792.75, 1793, 1792.5, 1792.25, 
1793, 1793, 1792.75, 1793.75, 1792.25, 1792.5, 1792.75, 1793, 
1792.75, 1792.25, 1792.75, 1793.25, 1793.25, 1793.5, 1793.5, 
1793.25, 1793.25, 1793), V7 = c(11L, 3L, 6L, 4L, 5L, 1L, 2L, 
2L, 2L, 8L, 9L, 1L, 5L, 2L, 5L, 8L, 3L, 11L, 2L, 3L, 1L, 4L, 
2L, 5L, 2L, 9L, 3L, 1L, 7L, 5L, 5L, 1L, 4L, 11L, 14L, 1L, 1L, 
1L, 4L, 20L, 2L, 1L, 8L, 5L, 2L, 2L, 1L, 1L, 15L, 1L, 26L, 2L, 
3L, 15L, 33L, 26L, 25L, 9L, 1L, 4L, 50L, 2L, 1L, 1L, 6L, 1L, 
2L, 1L, 1L, 11L, 10L, 12L, 3L, 3L, 56L, 2L, 21L, 1L, 2L, 1L, 
1L, 3L, 1L, 1L, 5L, 10L, 1L, 5L, 3L, 1L, 1L, 21L, 5L, 11L, 5L, 
1L, 1L, 1L, 4L, 1L)), row.names = c(NA, 100L), class = "data.frame")

如您所见,有一些缺失值。 例如,在 01/04/2007 23:26 和 01/04/2007 23:28 之间,我们错过了 01/04/2007 23:27

我想要的只是添加一个时间为 23:27 的行以及与前一行具有相同值的所有其他列

换句话说,每一天应该正好有 60(分钟)* 24(小时)= 1440 行 从 00:00 到 23:59

【问题讨论】:

  • 除了以下 Ronak 的回答,您可能还想考虑 tsibble 包中的一些小插曲。这些处理填充缺失的数据和不规则的时间序列数据。

标签: r time-series missing-data


【解决方案1】:

结合V1V2 来创建日期时间,使用complete 包括缺少的分钟,fill 在新行中填充之前的行值。

library(dplyr)
library(tidyr)

df %>%
  unite(datetime, V1, V2) %>%
  mutate(datetime = lubridate::dmy_hm(datetime)) %>%
  complete(datetime = seq(min(datetime), max(datetime), by = 'min')) %>%
  fill(everything()) %>%
  mutate(V1 = format(datetime, "%d/%m/%Y"), 
         V2 = format(datetime, '%H:%M')) %>%
  select(-datetime)

#     V3    V4    V5    V6    V7 V1         V2   
#   <dbl> <dbl> <dbl> <dbl> <int> <chr>      <chr>
# 1 1791  1791  1791. 1791.    11 01/04/2007 23:01
# 2 1791  1791  1791. 1791.    11 01/04/2007 23:02
# 3 1790. 1790. 1790. 1790.     3 01/04/2007 23:03
# 4 1790. 1790. 1790. 1790.     6 01/04/2007 23:04
# 5 1790. 1790. 1790. 1790.     4 01/04/2007 23:05
# 6 1790  1790. 1790  1790.     5 01/04/2007 23:06
# 7 1790. 1790. 1790. 1790.     1 01/04/2007 23:07
# 8 1790. 1790. 1790. 1790.     2 01/04/2007 23:08
# 9 1790  1790  1790  1790      2 01/04/2007 23:09
#10 1790  1790  1790  1790      2 01/04/2007 23:10
# … with 337 more rows

【讨论】:

  • 哇,非常简洁有效的代码。谢谢!如果我想回到日期 (dd/mm/yyyy) 和时间 (hh:mm) 之间的原始分隔?
  • unite 中添加remove = FALSE,即unite(datetime, V1, V2, remove = FALSE),这样这两列就会保持原样。然后您可以执行%&gt;% select(-datetime) 删除datetime 列。
  • remove = FALSE 效果很好。但是 V2 变量(时间)没有正确的时间。事实上,当有一个洞时,V2 包含一个与前一行相同的时间值。
  • 这不满足问题的最后一句。
  • 格洛腾迪克是什么意思?
【解决方案2】:

假设输入数据框是tdf,我们将其转换为动物园对象z,并创建所需的日期/时间范围rng。用分钟数填入mins 并将其与z 合并回zz。最后将其转换回数据框tdf2

library(zoo)

z <- read.zoo(tdf, index = 1:2, tz = "UTC", format = "%d/%m/%Y %H:%M")
rng <- as.POSIXct(paste(range(as.Date(time(z))), c("00:00:00", "23:59:00")))
mins <- seq(rng[1], rng[2], by = "min")
zz <- na.locf(merge(z, zoo(, mins), all = TRUE), na.rm = FALSE)
tdf2 <- fortify.zoo(zz)

根据您的需要,您可以直接使用 zoo 对象 zz,在这种情况下可以省略最后一行。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2011-11-04
    • 2010-12-15
    • 2016-12-30
    • 2018-04-24
    • 1970-01-01
    • 2019-02-05
    • 1970-01-01
    相关资源
    最近更新 更多