【问题标题】:calculate moving average every n hours每n小时计算一次移动平均值
【发布时间】:2015-11-05 19:32:55
【问题描述】:

我有一个数据框,一个时间序列,每个站点每小时进行 4 次观察。

我想每 4 小时计算一次移动平均值,使用 data.table,我可以每小时计算一次,但不能每 n 小时计算一次。

dput(df)
structure(list(time = structure(c(1414502100, 1414503000, 1414503900, 
1414504800, 1414505700, 1414506600, 1414507500, 1414508400, 1414509300, 
1414510200, 1414511100, 1414512000, 1414512900, 1414513800, 1414514700, 
1414515600, 1414516500, 1414517400, 1414518300, 1414519200, 1414520100, 
1414521000, 1414521900, 1414522800, 1414523700, 1414524600, 1414525500, 
1414526400, 1414527300, 1414528200, 1414529100, 1414530000, 1414530900, 
1414531800, 1414532700, 1414533600, 1414534500, 1414535400, 1414536300, 
1414537200), class = c("POSIXct", "POSIXt"), tzone = ""), site = c(2108L, 
2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 
2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 
2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 
2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 
2108L, 2108L, 2108L), val = c(38L, 38L, 35L, 35L, 35L, 35L, 37L, 
38L, 38L, 36L, 36L, 35L, 33L, 31L, 27L, 26L, 20L, 16L, 14L, 11L, 
7L, 5L, 2L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 1L, 0L, 0L), month = c(10, 10, 10, 10, 10, 10, 10, 10, 10, 
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10), 
    hour = c("14", "14", "14", "15", "15", "15", "15", "16", 
    "16", "16", "16", "17", "17", "17", "17", "18", "18", "18", 
    "18", "19", "19", "19", "19", "20", "20", "20", "20", "21", 
    "21", "21", "21", "22", "22", "22", "22", "23", "23", "23", 
    "23", "00"), min = c("15", "30", "45", "00", "15", "30", 
    "45", "00", "15", "30", "45", "00", "15", "30", "45", "00", 
    "15", "30", "45", "00", "15", "30", "45", "00", "15", "30", 
    "45", "00", "15", "30", "45", "00", "15", "30", "45", "00", 
    "15", "30", "45", "00"), day = c(28L, 28L, 28L, 28L, 28L, 
    28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 
    28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 
    28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 29L)), .Names = c("time", 
"site", "val", "month", "hour", "min", "day"), class = "data.frame", row.names = 191430:191469)



 dt<- data.table(df)
    dt[, ':=' ('hsd' = sd(val)), by = list(site, hour, day)]

 head(dt, 10)
                   time site val month hour min day      hsd
 1: 2014-10-28 14:15:00 2108  38    10   14  15  28 1.732051
 2: 2014-10-28 14:30:00 2108  38    10   14  30  28 1.732051
 3: 2014-10-28 14:45:00 2108  35    10   14  45  28 1.732051
 4: 2014-10-28 15:00:00 2108  35    10   15  00  28 1.000000
 5: 2014-10-28 15:15:00 2108  35    10   15  15  28 1.000000
 6: 2014-10-28 15:30:00 2108  35    10   15  30  28 1.000000
 7: 2014-10-28 15:45:00 2108  37    10   15  45  28 1.000000

这是计算移动平均线的正确方法吗?如何计算超过一小时

【问题讨论】:

  • 我怀疑 data.table 没有最好的滚轮工具。不过,您可以将它与 RcppRoll 或 zoo 一起使用。也许我误解了...显示您想要的输出将有助于澄清。
  • 我想知道每 4 小时计算一次是什么意思?您想要 4 小时间隔的平均值吗?
  • 是的,这正是我所需要的
  • 看看here 或许多类似的问题。您只需要除以一些 n 即可将总和转换为平均值。
  • 我不明白你为什么要问移动平均线,然后计算标准差。

标签: r data.table


【解决方案1】:

您可以使用dplyrzoo 执行此操作。尽管您的示例数据仅包含一个站点,但我在这里按站点分组,因为我猜您的实际数据包含许多站点。我还假设您想要重叠间隔的值,而不是连续的。

library(zoo)
library(dplyr)

new.df <- df %>%
  group_by(site) %>%  # This only matters if your actual data have multiple sites
  mutate(moving.avg = rollmean(x = val, width = 16,  # 16 is 4 hours x 4 obs per hour
    align = "right", fill = NA))

如果您只想要连续窗口的平均值 - 即每四个小时,或者在这种情况下,每组内的每 16 次观察 - 然后使用 rollapply 并指定 byalign = "right" 选项,即:

mutate(moving.avg = rollapply(data = val, FUN = mean, width = 16, by = 16,
  align = "right", fill = NA)) 

【讨论】:

  • 如果数据中缺少某些句点,此解决方案将计算不正确的值。必须先扩展数据。
猜你喜欢
  • 1970-01-01
  • 2019-11-24
  • 2016-02-19
  • 2019-02-21
  • 2017-12-23
  • 2021-11-08
  • 2017-11-12
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多