如何在连接之前识别两个数据帧之间的不匹配 ID？答案

【问题标题】：How to identify mismatching ids between two dataframes prior to a join?如何在连接之前识别两个数据帧之间的不匹配 ID？
【发布时间】：2023-03-10 19:20:01
【问题描述】：

我正在创建一个函数来识别两个数据帧之间缺失的 ID，然后再将它们连接在一起。

到目前为止，我的函数如下所示：

match_check <- function(df1,var1,df2,var2){
  df1ids <- unique(df1$var1)
  matchs <- c()
  no_matchs <- c() 
  for (id in df1ids){
    if (id %in% df2$var2 == TRUE){
      match <- append(match, id)}
     else{
       no_matchs <- append(no_match,id)
     }
  }
  print(no_matchs)
  match2 <- c()
   no_match2 <- c()
  df2ids <- unique(df2$var2)
  for (id in df2ids){
    if (id %in% df1$var1 == TRUE){
      match2 <- append(match2, id)}
     else{
      no_match2 <- append(no_match2,id)
    }
  }
  print(no_match2)
 }

test1 <- data.frame(id=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15))
test2 <- data.frame(id=c(0,-2,-4,-6,-1,1,2,3,4,5,5,6,7,8))

match_check(test1,id,test2,id)

当我运行该函数时，打印的向量被打印为 NULL。我希望它打印在另一个中找不到的 ID，因此我知道另一个中缺少哪些 ID，从而给出如下所示的向量：

no_matchs = c(9,10,11,12,13,14,15)
no_match2 = c(0,-2,-4,-6)

【问题讨论】：

为什么不setdiff(test1$id, test2$id) 和setdiff(test2$id, test1$id)？

标签： r dataframe join missing-data

【解决方案1】：

test1 <- data.frame(id=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15))
test2 <- data.frame(id=c(0,-2,-4,-6,-1,1,2,3,4,5,5,6,7,8))

(no_matchs <- setdiff(test1$id, test2$id))
#> [1]  9 10 11 12 13 14 15
(no_match2 <- setdiff(test2$id, test1$id))
#> [1]  0 -2 -4 -6 -1

【讨论】：