【问题标题】:R - Identify duplicate rows based on multiple columns and remove them based on dateR - 根据多列识别重复行并根据日期删除它们
【发布时间】:2020-11-15 06:17:47
【问题描述】:
  1. 我想识别 ID 列中的重复项,但前提是 Wave==2 (在下面的示例中,只有 'C' 在第 2 波中重复)。

  2. 然后我想根据Date 选择最新的副本并将其从数据框df 中删除。

我该怎么做?

structure(list(ID = c("E", "G", "C", "B", "D", "E", "A", "D", 
"F", "F", "C", "A", "B", "C", "A"), Wave = c(2L, 1L, 1L, 2L, 
2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L), Date = c("25/02/2020", 
"18/02/2020", "14/02/2020", "21/02/2020", "24/02/2020", "16/02/2020", 
"12/02/2020", "15/02/2020", "17/02/2020", "26/02/2020", "22/02/2020", 
"20/02/2020", "13/02/2020", "23/02/2020", "11/02/2020")), class = "data.frame", row.names = c(NA, 
-15L))

【问题讨论】:

    标签: r duplicates


    【解决方案1】:

    您可以使用slice 选择Wave = 2 所在的最新行。

    library(dplyr)
    
    df %>%
      mutate(Date = lubridate::dmy(Date)) %>%
      group_by(ID, Wave) %>%
      slice(if(first(Wave) == 2) which.max(Date) else seq_len(n()))
    
    #   ID     Wave Date      
    #   <chr> <int> <date>    
    # 1 A         1 2020-02-12
    # 2 A         1 2020-02-11
    # 3 A         2 2020-02-20
    # 4 B         1 2020-02-13
    # 5 B         2 2020-02-21
    # 6 C         1 2020-02-14
    # 7 C         2 2020-02-23
    # 8 D         1 2020-02-15
    # 9 D         2 2020-02-24
    #10 E         1 2020-02-16
    #11 E         2 2020-02-25
    #12 F         1 2020-02-17
    #13 F         2 2020-02-26
    #14 G         1 2020-02-18
    

    【讨论】:

      【解决方案2】:

      这是filter的选项

      library(dplyr)
      library(lubridate)
      df1 %>% 
          arrange(ID, Wave, dmy(Date)) %>%
          group_by(ID, Wave) %>% 
          filter((row_number() == 1 & first(Wave) == 2)|first(Wave) != 2)
      # A tibble: 14 x 3
      # Groups:   ID, Wave [13]
      #   ID     Wave Date      
      #   <chr> <int> <chr>     
      # 1 A         1 11/02/2020
      # 2 A         1 12/02/2020
      # 3 A         2 20/02/2020
      # 4 B         1 13/02/2020
      # 5 B         2 21/02/2020
      # 6 C         1 14/02/2020
      # 7 C         2 22/02/2020
      # 8 D         1 15/02/2020
      # 9 D         2 24/02/2020
      #10 E         1 16/02/2020
      #11 E         2 25/02/2020
      #12 F         1 17/02/2020
      #13 F         2 26/02/2020
      #14 G         1 18/02/2020
      

      【讨论】:

        猜你喜欢
        • 2021-09-11
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2022-06-17
        • 2021-12-15
        • 1970-01-01
        • 2017-04-14
        • 2021-09-21
        相关资源
        最近更新 更多