【问题标题】:R remove duplicates based on other columnsR根据其他列删除重复项
【发布时间】:2018-06-29 15:22:20
【问题描述】:

我想根据其他列的异同删除重复项。

应完全删除所有重复的 ID,但前提是它们具有不同的颜色。它们是否也有不同的子组也没关系。如果它们具有相同的 ID 和相同的颜色,则应保留第一个。

最后,我想要一个所有 ID 的列表,这些 ID 都是单色的(独立于子组)。应删除所有多色 ID。

这里和例子:

   id colour   subgroup
1   1    red   lightred
2   2   blue  lightblue
3   2   blue   darkblue
4   3    red   lightred
5   4    red    darkred
6   4    red    darkred
7   4   blue  lightblue
8   5  green  darkgreen
9   5  green  darkgreen
10  5  green lightgreen
11  6    red    darkred
12  6   blue   darkblue
13  6  green lightgreen

最后应该是这样的:

  id colour  subgroup
1  1    red  lightred
2  2   blue lightblue
4  3    red  lightred
8  5  green darkgreen

我在这个例子中使用的数据:

id = c(1,2,2,3,4,4,4,5,5,5,6,6,6)
colour = c("red","blue","blue","red","red","red","blue","green","green","green","red","blue","green")
subgroup = c("lightred","lightblue","darkblue","lightred","darkred","darkred","lightblue","darkgreen","darkgreen","lightgreen","darkred","darkblue","lightgreen")
data = data.frame(cbind(id,colour,subgroup))

感谢您的帮助!

【问题讨论】:

    标签: r duplicates


    【解决方案1】:
    library(tidyverse)
    data%>%
      group_by(id)%>%
      filter(1==length(unique(colour)),!duplicated(colour))
    # A tibble: 4 x 3
    # Groups:   id [4]
      id    colour subgroup 
      <fct> <fct>  <fct>    
    1 1     red    lightred 
    2 2     blue   lightblue
    3 3     red    lightred 
    4 5     green  darkgreen
    

    使用基础 R:

     subset(data,as.logical(ave(colour,id,FUN=function(x)length(unique(x))==1& !duplicated(x))))
      id colour  subgroup
    1  1    red  lightred
    2  2   blue lightblue
    4  3    red  lightred
    8  5  green darkgreen
    

    【讨论】:

      【解决方案2】:

      我有一个小的data.table 解决方案。它首先过滤所有非重复的idcolour 组合,然后选择所有组合,其中只有一个idcolour 组合存在。

      library(data.table)
      dt.data <- data.table(data)
      dt.data[!duplicated(dt.data, by = c("id", "colour"))
                             ,.(colour, subgroup, .N)
                             , by = list(id)][N==1, .(id
                                                     , colour
                                                     , subgroup)]
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2023-01-11
        • 1970-01-01
        • 2021-05-14
        • 2021-01-10
        • 2021-08-09
        • 2021-02-09
        • 2017-03-02
        相关资源
        最近更新 更多