【问题标题】:r collapse timeline separated by 31 days by IDr 折叠时间线按 ID 分隔 31 天
【发布时间】:2022-01-18 02:01:48
【问题描述】:

此问题与此处的问题类似 r collapse by year by ID

但是,我喜欢按 ID 和 State 折叠时间线,前提是它们的时间线之间的间隔是 31 天。如果间隔超过 31 天,则它们不会折叠,它们会从新行开始。例如,如果这是我的数据集

ID     From           To           State
1      2004-04-05     2005-02-05   MD
1      2005-03-05     2005-03-05   MD
1      2005-04-05     2005-10-05   DC
1      2006-03-05     2006-10-05   DC
1      2006-11-05     2007-03-05   DC
1      2007-04-05     2007-06-05   MD
1      2008-03-05     2008-11-05   MD
1      2008-12-05     2010-08-05   MD
1      2010-09-05     2012-11-05   MD
2      2003-05-05     2004-08-05   OR
2      2004-09-05     2009-03-05   OR
2      2010-06-05     2010-08-05   AZ
2      2013-06-05     2015-06-05   AZ

折叠后的最终数据集如下所示

ID     From           To           State

1      2004-04-05     2005-03-05   MD

1      2005-04-05     2005-10-05   DC

1      2006-04-05     2007-03-05   DC

1      2007-04-05     2007-06-05   MD

1      2008-03-05     2012-11-05   MD

2      2003-05-05     2009-03-05   OR

2      2010-06-05     2010-08-05   AZ

2      2013-06-05     2015-06-05   AZ

非常感谢您对此提出任何建议。

测试用例 2:

ID     From           To           State
1      2003-09-05     2003-11-05   MD
1      2004-09-05     2007-05-05   TX
1      2007-06-05     2007-07-05   DC
1      2007-08-05     2009-07-05   DC
1      2011-11-05     2014-03-05   MD
1      2014-05-05     2017-06-05   MD

预期结果

ID     From           To           State
1      2003-09-05     2003-11-05   MD
1      2004-09-05     2007-05-05   TX
1      2007-06-05     2009-07-05   DC 
1      2011-11-05     2017-06-05   MD

【问题讨论】:

  • 但是你为什么要折叠2003-05-05 2004-08-05 OR2004-09-05 2009-03-05 OR呢? 2004-8-5 和 2004-9-5 之间有 31 天。
  • @ekoam,很好,我已经更新了我的问题以反映 31 天而不是 30 天。感谢您了解这个。

标签: r dplyr time collapse


【解决方案1】:

从前一个To 日期中减去当前From 日期并创建一个新的分组列并在每个组中选择first From 值和last To 值。

library(dplyr)

df %>%
  mutate(across(c(From, To), as.Date)) %>%
  group_by(ID, State, 
           group = cumsum(From - dplyr::lag(To, default = as.Date('1970-01-01')) > 31)) %>%
  summarise(From = first(From), 
            To = last(To), .groups = 'drop') %>%
  select(-group) %>%
  arrange(ID, From)

#     ID State From       To        
#  <int> <chr> <date>     <date>    
#1     1 MD    2004-04-05 2005-03-05
#2     1 DC    2005-04-05 2005-10-05
#3     1 DC    2006-03-05 2007-03-05
#4     1 MD    2007-04-05 2007-06-05
#5     1 MD    2008-03-05 2012-11-05
#6     2 OR    2003-05-05 2009-03-05
#7     2 AZ    2010-06-05 2010-08-05
#8     2 AZ    2013-06-05 2015-06-05

【讨论】:

  • 建议的解决方案适用于Testcase1,如果状态间隔超过31天并再次重复,则它不起作用,它不适用于Testcase2。
猜你喜欢
  • 2021-11-21
  • 1970-01-01
  • 2015-07-18
  • 2019-04-12
  • 2019-12-12
  • 2019-08-16
  • 2015-09-25
  • 2018-05-29
相关资源
最近更新 更多