【问题标题】:calculating sum of an irregular time interval data in R计算R中不规则时间间隔数据的总和
【发布时间】:2021-07-20 23:37:32
【问题描述】:

我有两个数据框:数据框 1,它是具有分类值的不规则时间间隔数据,以及数据框 2,它是具有整数值的规则间隔数据。

数据框 1

Start Date End Date Category
1980-01-05 1983-02-17 A
1983-02-17 1987-01-02 B
1987-01-02 1989-11-10 C
1989-11-10 1992-03-20 D

数据框 2

Date variable 1 variable 2 variable 3 ...
1980-01-01 0 0 2 ...
1980-02-01 0 0 0 ...
1980-03-01 0 0 0 ...
1980-04-01 0 1 2 ...
1980-05-01 0 1 0 ...
1980-06-01 -1 0 1 ...
1980-07-01 -2 0 1 ...
1980-08-01 -1 0 2 ...
1980-09-01 0 2 1 ...
1980-10-01 0 0 2 ...
... ... ... ... ...

使用此数据框,我想将数据框 2 中的观察结果合并到数据框 1 中,将数据框 1 中开始日期到结束日期区间的值相加。

所以输出应该是这样的:

Start Date End Date Category variable 1 variable 2 variable 3 ...
1980-01-05 1983-02-17 A Sum of variable 1 from the start date to end date Sum of variable 2 from the start date to end date Sum of variable 3 from the start date to end date ...
1983-02-17 1987-01-02 B Sum of variable 1 from the start date to end date Sum of variable 2 from the start date to end date Sum of variable 3 from the start date to end date ...
1987-01-02 1989-11-10 C Sum of variable 1 from the start date to end date Sum of variable 2 from the start date to end date Sum of variable 3 from the start date to end date ...
1989-11-10 1992-03-20 D Sum of variable 1 from the start date to end date Sum of variable 2 from the start date to end date Sum of variable 3 from the start date to end date ...

【问题讨论】:

  • 是否始终保证结束日期行 n = 开始日期行 n+1?
  • 是的。始终保证如此。 @dash2

标签: r time merge sum intervals


【解决方案1】:

这是使用我的 santoku 包的解决方案。你也可以使用cut.Date

library(santoku)
library(dplyr)

# I assume that df1$Start_Date is a Date object
start_dates <- df1$Start_Date
df2$Start_Date <- santoku::chop(df2$Date, start_dates, labels = lbl_endpoint())
df2$Start_Date <- as.Date(as.character(df2$Start_Date))

df_summ <- df2 %>% 
             left_join(df1,  by = "Start_Date") %>%
             group_by(Start_Date) %>%
             summarize(
               End_Date   = End_Date[1],
               Category   = Category[1],
               Variable_1 = sum(Variable_1),
               Variable_2 = sum(Variable_2),
               Variable_3 = sum(Variable_3)
             )

您也可以将across() 用于变量 1 到 3,但上面的内容非常简单明了,但会重复一些。

【讨论】:

    猜你喜欢
    • 2015-06-03
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2014-11-08
    • 1970-01-01
    • 2019-12-26
    • 2013-10-08
    • 1970-01-01
    相关资源
    最近更新 更多