【问题标题】:sum of positive events over a 12 month rolling window12 个月滚动窗口内积极事件的总和
【发布时间】:2019-07-23 15:41:29
【问题描述】:

我正在尝试计算 12 个月滚动窗口内积极事件的数量。

我可以每年创建 365 行缺失数据并使用 zoo::rollapply 来计算每 365 行数据的事件数,但我的数据框非常大,我想在一堆变量上执行此操作,所以这需要永远运行。

我可以用这个得到正确的输出:

data <- data.frame(id = c("a","a","a","a","a","b","b","b","b","b"),
                   date = c("20-01-2011","20-04-2011","20-10-2011","20-02-2012",
                            "20-05-2012","20-01-2013","20-04-2013","20-10-2013",
                            "20-02-2014","20-05-2014"),
                   event = c(0,1,1,1,0,1,0,0,1,1))
library(lubridate)
library(dplyr)
library(tidyr)
library(zoo)

data %>%
group_by(id) %>%
mutate(date = dmy(date),
       cumsum = cumsum(event)) %>%
complete(date = full_seq(date, period = 1), fill = list(event = 0)) %>%
mutate(event12 = rollapplyr(event, width = 365, FUN = sum, partial = TRUE)) %>%
drop_na(cumsum)

这是什么:

 id     date       event cumsum event12
 <fct>  <date>     <dbl>  <dbl>   <dbl>
 a      2011-01-20     0      0       0
 a      2011-04-20     1      1       1
 a      2011-10-20     1      2       2
 a      2012-02-20     1      3       3
 a      2012-05-20     0      3       2
 b      2013-01-20     1      1       1
 b      2013-04-20     0      1       1
 b      2013-10-20     0      1       1
 b      2014-02-20     1      2       1
 b      2014-05-20     1      3       2

但想看看是否有更有效的方法,例如如何使rollyapply 中的宽度计数日期而不是计数行数。

【问题讨论】:

    标签: r dplyr tidyverse zoo rollapply


    【解决方案1】:

    这可以在将日期转换为Date 类之后使用复杂的自联接和单个 sql 语句来完成,而无需填写缺失的日期:

    library(sqldf)
    
    data2 <- transform(data, date = as.Date(date, "%d-%m-%Y"))
    
    sqldf("select a.*, sum(b.event) as event12
      from data2 as a
      left join data2 as b on a.id = b.id and b.date between a.date - 365 and a.date
      group by a.rowid
      order by a.rowid")
    

    给予:

       id       date event event12
    1   a 2011-01-20     0       0
    2   a 2011-04-20     1       1
    3   a 2011-10-20     1       2
    4   a 2012-02-20     1       3
    5   a 2012-05-20     0       2
    6   b 2013-01-20     1       1
    7   b 2013-04-20     0       1
    8   b 2013-10-20     0       1
    9   b 2014-02-20     1       1
    10  b 2014-05-20     1       2
    

    【讨论】:

      猜你喜欢
      • 2017-06-05
      • 2018-03-01
      • 1970-01-01
      • 2022-08-02
      • 2021-02-11
      • 2019-01-04
      • 2020-11-24
      • 1970-01-01
      • 2021-08-14
      相关资源
      最近更新 更多