根据 R 中另一列中的重复值删除一列中的行（删除特定的原始数据）答案

【问题标题】：remove rows in one column based on duplicated values in another column in R (remove specific raws)根据 R 中另一列中的重复值删除一列中的行（删除特定的原始数据）
【发布时间】：2021-06-29 08:08:00
【问题描述】：

在我的数据集中，我有两列。 POINT：仅包含在整个数据集中重复的两个分类值“随机”和“当前”。 ID：包含一组与 POINT 中的值关联的 5 位连续值。 ID 中的某些值是重复的。

当与“当前”相比，POINT 值为“随机”时，我无法找出 R 中的代码来仅消除 ID 列中具有重复值的原始数据。所以我想要下面的数据集：

POINT	ID
Current	45905
Current	40817
Current	55936
Current	66608
Current	66608
Random	45905
Random	40817
Random	55936
Random	66608
Random	44456

看起来像这样：

POINT	ID
Current	45905
Current	40817
Current	55936
Current	66608
Current	66608
Random	44456

【问题讨论】：

对不起，我在第二个表的编码中遗漏了一些东西，希望这是有道理的。
欢迎使用 *！我不太明白你问题的这一部分：“与‘当前’相比，当 POINT 值是‘随机’时。”您的意思是，对于重复的 ID，您何时有两个 Point 值：随机和当前？

标签： r duplicates

【解决方案1】：

使用dpylr 可以这样实现：

按POINT 拆分数据
使用anti_join 过滤随机部分中的非重复 ID
将过滤后的随机数据集绑定到当前数据集。

d <- data.frame(
  stringsAsFactors = FALSE,
             POINT = c("Current","Current","Current",
                       "Current","Current","Random","Random","Random",
                       "Random","Random"),
                ID = c(45905L,40817L,55936L,66608L,
                       66608L,45905L,40817L,55936L,66608L,44456L)
)

d_split <- split(d, d$POINT)

library(dplyr)

random_keep <- dplyr::anti_join(d_split$Random, d_split$Current, by = "ID")
d_final <- dplyr::bind_rows(d_split$Current, random_keep)

head(d_final)
#>     POINT    ID
#> 1 Current 45905
#> 2 Current 40817
#> 3 Current 55936
#> 4 Current 66608
#> 5 Current 66608
#> 6  Random 44456

【讨论】：

嗨，谢谢，实际上这段代码就是解决方案。

【解决方案2】：

如果我理解正确，您可以使用dplyr 来执行此操作：

library(dplyr)

split_data <- split(your_data, ~ POINT)

full_join(split_data$Current, split_data$Random, by = "ID") %>%
  transmute(POINT = coalesce(POINT.x, "Random"), ID)

# A tibble: 6 x 2
  POINT      ID
  <chr>   <int>
1 Current 45905
2 Current 40817
3 Current 55936
4 Current 66608
5 Current 66608
6 Random  44456

（使用的数据：）

your_data <- structure(list(POINT = c("Current", "Current", "Current", "Current", "Current", "Random", "Random", "Random", "Random", "Random"), ID = c(45905L, 40817L, 55936L, 66608L, 66608L, 45905L, 40817L, 55936L, 66608L, 44456L)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))

【讨论】：

【解决方案3】：

您可以使用取反的%in% 来排除POINT==random 的重复ID。

i <- D$POINT=="Current"
D[i | !D$ID %in% D$ID[i],]
#     POINT    ID
#1  Current 45905
#2  Current 40817
#3  Current 55936
#4  Current 66608
#5  Current 66608
#10  Random 44456

数据：

D <- data.frame(POINT = c("Current","Current","Current","Current","Current"
  ,"Random","Random","Random","Random","Random")
, ID = c(45905L,40817L,55936L,66608L,66608L,45905L,40817L,55936L,66608L,44456L))

【讨论】：