【问题标题】:Classifying rows based on if a date falls within a given date range根据日期是否在给定日期范围内对行进行分类
【发布时间】:2021-08-13 23:55:12
【问题描述】:

希望根据现有事件日期是否在另一列中的 2 个日期内对现有事件日期进行分类

我的原始数据如下所示:

study_id Event Event_Date Event_Result Pre1 CheckUp Post1 Pre2 CheckUp2 Post2 Pre3 CheckUp3 Post3 Pre4 CheckUp4 Post4 Pre5 CheckUp5 Post5
1 event1 5/4/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017
1 event1 5/15/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017
1 event1 6/5/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017
1 event1 7/3/2012 0.8 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017
2 event2 8/14/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017
2 event2 9/11/2012 1.2 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017
2 event1 9/21/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017
3 event1 10/9/2012 1.1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017
3 event1 10/23/2012 1.1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017
3 event2 10/25/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017
4 event2 11/2/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017
4 event1 11/13/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017

预期结果我希望每个间隔规则都有一个列(有关间隔详细信息,请参阅下面我尝试过的部分,但基本上总共有 10 个额外的列),如果 event_Date 下降在区间规则内,标记为真,否则为假。我可以在 Excel 中做到这一点,但在 R 中寻找解决方案。

请参阅下面的示例。

personID Event_Date Event_Result Pre1 CheckUp Post1 Pre2 CheckUp2 Post2 Pre3 CheckUp3 Post3 Pre4 CheckUp4 Post4 Pre5 CheckUp5 Post5 Interval1 Interval2 Interval3 Interval4 Interval5 Interval6 Interval7 Interval8 Interval9 Interval10
1 5/4/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
1 5/15/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
1 6/5/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
2 7/3/2012 0.8 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
2 8/14/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
3 9/11/2012 1.2 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
3 9/21/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
3 10/9/2012 1.1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
4 10/23/2012 1.1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
4 10/25/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017 FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
4 11/2/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017 FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
5 11/13/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017 FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

我的尝试 我试图创建间隔来查看 Event_Result 并查看它是否在基于 Post、Pre 和 CheckUp 日期的范围内。我关心的区间规则如下10条:

Interval1 = 如果 EventDate >= Pre1 AND EvenDate

Interval2 = 如果 EventDate > CheckUp AND EvenDate

Interval3 = 如果 EventDate >= Pre2 AND EvenDate

Interval4 = 如果 EventDate > CheckUp2 AND EvenDate

*

*

Interval9 = 如果 EventDate >= Pre5 AND EventDate

Interval10 = 如果 EventDate > CheckUp5 AND EventDate

我尝试使用 LUBRIDATE,但没有成功。正如我提到的,我能够在 Excel 中执行此操作,但只是在 R 中寻找一种方法(对 LUBRIDATE、DPLYR 或其他库开放)

INTERVAL_1

【问题讨论】:

    标签: r date dplyr


    【解决方案1】:

    这是另一种尝试的方法。

    首先,确保您的日期采用Date 格式。将CheckUp 重命名为CheckUp1,以便与其他列保持一致。

    然后,您可以尝试使用pivot_longer 将数据转换为长格式,并使用group_by 查看事件日期。在这种情况下,创建了一个列来评估事件日期是否在 pre 和 checkup 之间的时间间隔内,并创建第二列用于 post 的检查。

    之后,您可以再次将结果转换为宽格式,并与原始数据连接。

    library(tidyverse)
    
    prep_data <- raw_data %>%
      mutate(across(starts_with(c("Pre", "CheckUp", "Post")), as.Date, format = "%m/%d/%Y"),
             Event_Date = as.Date(Event_Date, format = "%m/%d/%Y")) %>%
      rename(CheckUp1 = CheckUp)
    
    prep_data %>%
      pivot_longer(cols = starts_with(c("Pre", "CheckUp", "Post")), names_to = c("Event_Type", "Number"), names_pattern = "(\\w+)(\\d+)") %>%
      group_by(study_id, Event, Event_Date, Event_Result, Number) %>%
      mutate(interval_pre = Event_Date >= value[Event_Type == "Pre"] & Event_Date <= value[Event_Type == "CheckUp"],
             interval_post = Event_Date > value[Event_Type == "CheckUp"] & Event_Date <= value[Event_Type == "Post"]) %>%
      pivot_wider(id_cols = c(study_id, Event, Event_Date, Event_Result), names_from = Number, values_from = c("interval_pre", "interval_post"), values_fn = first) %>%
      right_join(prep_data %>% select(study_id, Event, Event_Date, Event_Result))
    

    输出

       study_id Event  Event_Date Event_Result interval_pre_1 interval_pre_2 interval_pre_3 interval_pre_4 interval_pre_5 interval_post_1 interval_post_2 interval_post_3 interval_post_4 interval_post_5
          <int> <chr>  <date>            <dbl> <lgl>          <lgl>          <lgl>          <lgl>          <lgl>          <lgl>           <lgl>           <lgl>           <lgl>           <lgl>          
     1        1 event1 2012-05-04          1   FALSE          FALSE          FALSE          FALSE          FALSE          FALSE           FALSE           FALSE           FALSE           FALSE          
     2        1 event1 2012-05-15          1   FALSE          FALSE          FALSE          FALSE          FALSE          FALSE           FALSE           FALSE           FALSE           FALSE          
     3        1 event1 2012-06-05          1   FALSE          FALSE          FALSE          FALSE          FALSE          FALSE           FALSE           FALSE           FALSE           FALSE          
     4        1 event1 2012-07-03          0.8 FALSE          FALSE          FALSE          FALSE          FALSE          FALSE           FALSE           FALSE           FALSE           FALSE          
     5        2 event2 2012-08-14          1   TRUE           FALSE          FALSE          FALSE          FALSE          FALSE           FALSE           FALSE           FALSE           FALSE          
     6        2 event2 2012-09-11          1.2 TRUE           FALSE          FALSE          FALSE          FALSE          FALSE           FALSE           FALSE           FALSE           FALSE          
     7        2 event1 2012-09-21          1   TRUE           FALSE          FALSE          FALSE          FALSE          FALSE           FALSE           FALSE           FALSE           FALSE          
     8        3 event1 2012-10-09          1.1 TRUE           FALSE          FALSE          FALSE          FALSE          FALSE           FALSE           FALSE           FALSE           FALSE          
     9        3 event1 2012-10-23          1.1 TRUE           FALSE          FALSE          FALSE          FALSE          FALSE           FALSE           FALSE           FALSE           FALSE          
    10        3 event2 2012-10-25          1   FALSE          FALSE          FALSE          FALSE          FALSE          TRUE            FALSE           FALSE           FALSE           FALSE          
    11        4 event2 2012-11-02          1   FALSE          FALSE          FALSE          FALSE          FALSE          TRUE            FALSE           FALSE           FALSE           FALSE          
    12        4 event1 2012-11-13          1   FALSE          FALSE          FALSE          FALSE          FALSE          TRUE            FALSE           FALSE           FALSE           FALSE 
    

    【讨论】:

      【解决方案2】:
      library(lubridate)
      library(dplyr)
      df <- data.frame(
        person_id = c(1, 1, 1, 2, 2, 3, 3, 3),
        event_date = ymd(c("20120504",
                           "20120515",
                           "20120605",
                           "20120703",
                           "20120814",
                           "20120911",
                           "20120921",
                           "20121009"))
      )
      
      pre1 <- ymd("20120723")
      checkup <- ymd("20121023")
      post1 <- ymd("20130123")
      
      interval1 <- interval(pre1, checkup)
      interval2 <- interval(checkup, post1)
      
      df %>%
        mutate(
          int1 = event_date %within% interval1,
          int2 = event_date %within% interval2
        )
      
        person_id event_date  int1  int2
      1         1 2012-05-04 FALSE FALSE
      2         1 2012-05-15 FALSE FALSE
      3         1 2012-06-05 FALSE FALSE
      4         2 2012-07-03 FALSE FALSE
      5         2 2012-08-14  TRUE FALSE
      6         3 2012-09-11  TRUE FALSE
      7         3 2012-09-21  TRUE FALSE
      8         3 2012-10-09  TRUE FALSE
      

      另一种选择

      library(lubridate)
      library(dplyr)
      df <- data.frame(
        person_id = c(1, 1, 1, 2, 2, 3, 3, 3),
        event_date = ymd(c("20120504",
                           "20120515",
                           "20120605",
                           "20120703",
                           "20120814",
                           "20120911",
                           "20120921",
                           "20121009")),
        pre1 = ymd("20120723"),
        checkup = ymd("20121023"),
        post1 = ymd("20130123")
      )
      
      df %>%
        mutate(
          int1 = event_date %within% interval(pre1, checkup),
          int2 = event_date %within% interval(checkup, post1)
        )
      
        person_id event_date       pre1    checkup      post1  int1  int2
      1         1 2012-05-04 2012-07-23 2012-10-23 2013-01-23 FALSE FALSE
      2         1 2012-05-15 2012-07-23 2012-10-23 2013-01-23 FALSE FALSE
      3         1 2012-06-05 2012-07-23 2012-10-23 2013-01-23 FALSE FALSE
      4         2 2012-07-03 2012-07-23 2012-10-23 2013-01-23 FALSE FALSE
      5         2 2012-08-14 2012-07-23 2012-10-23 2013-01-23  TRUE FALSE
      6         3 2012-09-11 2012-07-23 2012-10-23 2013-01-23  TRUE FALSE
      7         3 2012-09-21 2012-07-23 2012-10-23 2013-01-23  TRUE FALSE
      8         3 2012-10-09 2012-07-23 2012-10-23 2013-01-23  TRUE FALSE
      

      【讨论】:

      • 谢谢!这是有道理的。我确实有超过 1000 行和至少 15 列。有没有办法自动创建 pre1、checkup、post1 而不是手动声明它们?
      • 也许,您可以像您一样在数据框中进行分配。我已经编辑了我的答案。
      猜你喜欢
      • 1970-01-01
      • 2016-12-08
      • 1970-01-01
      • 1970-01-01
      • 2013-01-28
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多