【问题标题】:R - How to make a mean/average of n previous values, excluding current observation (rolling average)R - 如何制作 n 个先前值的平均值/平均值,不包括当前观察值(滚动平均值)
【发布时间】:2021-02-08 06:41:55
【问题描述】:

有人可以建议如何最好地在数据框中创建一个新列,其中每个观察值都是前 12 个观察值的平均值/平均值(不包括当前观察值)。到目前为止,我还没有在这里找到类似的答案,因此将不胜感激!

我的数据框:

LateCounts <- 

    Date    Count
1   Jan-19  7
2   Feb-19  4
3   Mar-19  9
4   Apr-19  8
5   May-19  7
6   Jun-19  4
7   Jul-19  4
8   Aug-19  5
9   Sep-19  2
10  Oct-19  5
11  Nov-19  7
12  Dec-19  4
13  Jan-20  3
14  Feb-20  4
15  Mar-20  5
16  Apr-20  2
17  May-20  3
18  Jun-20  2
19  Jul-20  3
20  Aug-20  4
21  Sep-20  3
22  Oct-20  2

我目前正在使用以下代码:

LateCounts <- LateCounts %>% mutate(RollAvge=rollapplyr(Count, 12, mean, partial = TRUE))

这会产生以下但 12 个月的滚动平均值:

    Date    Count   RollAvge
1   Jan-19   7      7
2   Feb-19   4      5.5
3   Mar-19   9      6.666667
4   Apr-19   8      7
5   May-19   7      7
6   Jun-19   4      6.5
7   Jul-19   4      6.142857
8   Aug-19   5      6
9   Sep-19   2      5.555556
10  Oct-19   5      5.5
11  Nov-19   7      5.636364
12  Dec-19   4      5.5
13  Jan-20   3      5.166667
14  Feb-20   4      5.166667
15  Mar-20   5      4.833333
16  Apr-20   2      4.333333
17  May-20   3      4
18  Jun-20   2      3.833333
19  Jul-20   3      3.75
20  Aug-20   4      3.666667
21  Sep-20   3      3.75
22  Oct-20   2      3.5

我真正需要实现的是以下。这是 12 个月的追踪或滚动平均值(其中“RollAvge”列中的值是“计数”列中先前值的平均值/平均值 - 不包括当前月份。

    Date    Count   RollAvge
1   Jan-19  7   
2   Feb-19  4       7
3   Mar-19  9       5.5
4   Apr-19  8       6.666667
5   May-19  7       7
6   Jun-19  4       7
7   Jul-19  4       6.5
8   Aug-19  5       6.142857
9   Sep-19  2       6
10  Oct-19  5       5.555556
11  Nov-19  7       5.5
12  Dec-19  4       5.636364
13  Jan-20  3       5.5
14  Feb-20  4       5.166667
15  Mar-20  5       5.166667
16  Apr-20  2       4.833333
17  May-20  3       4.333333
18  Jun-20  2       4
19  Jul-20  3       3.833333
20  Aug-20  4       3.75
21  Sep-20  3       3.666667
22  Oct-20  2       3.755556

谢谢,

【问题讨论】:

    标签: r time-series average mean rollapply


    【解决方案1】:

    我们需要从rollapply派生的输出中取lag

    library(dplyr)
    library(zoo)
    LateCounts %>%
          mutate(RollAvge= lag(rollapplyr(Count, 12, mean, partial = TRUE)))
    

    -输出

    #      Date Count RollAvge
    #1  Jan-19     7       NA
    #2  Feb-19     4 7.000000
    #3  Mar-19     9 5.500000
    #4  Apr-19     8 6.666667
    #5  May-19     7 7.000000
    #6  Jun-19     4 7.000000
    #7  Jul-19     4 6.500000
    #8  Aug-19     5 6.142857
    #9  Sep-19     2 6.000000
    #10 Oct-19     5 5.555556
    #11 Nov-19     7 5.500000
    #12 Dec-19     4 5.636364
    #13 Jan-20     3 5.500000
    #14 Feb-20     4 5.166667
    #15 Mar-20     5 5.166667
    #16 Apr-20     2 4.833333
    #17 May-20     3 4.333333
    #18 Jun-20     2 4.000000
    #19 Jul-20     3 3.833333
    #20 Aug-20     4 3.750000
    #21 Sep-20     3 3.666667
    #22 Oct-20     2 3.750000
    

    数据

    LateCounts <- structure(list(Date = c("Jan-19", "Feb-19", "Mar-19", "Apr-19", 
    "May-19", "Jun-19", "Jul-19", "Aug-19", "Sep-19", "Oct-19", "Nov-19", 
    "Dec-19", "Jan-20", "Feb-20", "Mar-20", "Apr-20", "May-20", "Jun-20", 
    "Jul-20", "Aug-20", "Sep-20", "Oct-20"), Count = c(7L, 4L, 9L, 
    8L, 7L, 4L, 4L, 5L, 2L, 5L, 7L, 4L, 3L, 4L, 5L, 2L, 3L, 2L, 3L, 
    4L, 3L, 2L)), class = "data.frame", row.names = c("1", "2", "3", 
    "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
    "16", "17", "18", "19", "20", "21", "22"))
    

    【讨论】:

    • 可以在 rollapply 中明确指定前 12 个值,避免 lag 使用 LateCounts %&gt;% mutate(RollAvge= rollapplyr(Count, list(-(1:12)), mean, partial = TRUE, fill = NA)) 这里 -(1:12) 表示使用偏移量 -1、-2、...、-12。可以为每一行指定一个单独的偏移向量,或者如果只指定一个,就像这里一样,那么它将被回收。
    【解决方案2】:

    使用 dplyrzoo 有一种方法可以使用 data.frame 函数 @NW320d 使用相同的滚动平均函数但没有变异和管道

    库(dplyr)

    图书馆(动物园)

    使用 @akrun 的 LateCounts 代码(感谢您提供的代码 sn-p!)

    > LateCounts <- structure(list(Date = c("Jan-19", "Feb-19", "Mar-19", "Apr-19", 
    + "May-19", "Jun-19", "Jul-19", "Aug-19", "Sep-19", "Oct-19", "Nov-19", 
    + "Dec-19", "Jan-20", "Feb-20", "Mar-20", "Apr-20", "May-20", "Jun-20", 
    + "Jul-20", "Aug-20", "Sep-20", "Oct-20"), Count = c(7L, 4L, 9L, 
    + 8L, 7L, 4L, 4L, 5L, 2L, 5L, 7L, 4L, 3L, 4L, 5L, 2L, 3L, 2L, 3L, 
    + 4L, 3L, 2L)), class = "data.frame", row.names = c("1", "2", "3", 
    + "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
    + "16", "17", "18", "19", "20", "21", "22"))
    
    > data.frame(LateCounts$Count, rollavg=dplyr::lag(rollapplyr(LateCounts$Count, 12, mean, partial = TRUE)))
    
    Output:
       LateCounts.Count  rollavg
    1                 7       NA
    2                 4 7.000000
    3                 9 5.500000
    4                 8 6.666667
    5                 7 7.000000
    6                 4 7.000000
    7                 4 6.500000
    8                 5 6.142857
    9                 2 6.000000
    10                5 5.555556
    11                7 5.500000
    12                4 5.636364
    13                3 5.500000
    14                4 5.166667
    15                5 5.166667
    16                2 4.833333
    17                3 4.333333
    18                2 4.000000
    19                3 3.833333
    20                4 3.750000
    21                3 3.666667
    22                2 3.750000
    

    【讨论】:

      猜你喜欢
      • 2017-01-02
      • 1970-01-01
      • 2021-11-22
      • 1970-01-01
      • 2020-05-06
      • 1970-01-01
      • 2015-03-12
      • 1970-01-01
      • 2022-11-03
      相关资源
      最近更新 更多