【问题标题】:Comparing two data frames in R比较 R 中的两个数据帧
【发布时间】:2017-04-26 14:29:47
【问题描述】:

我是 R 新手,我有 2 个数据框,如下所示:

df1
T_id U_id  U_code  score  
A_0_1 UHJKI XPOS_hp 134
B_1_3 NBVFR LKJ_mm  543
C_9_0 TRFDA NBV_lp  80
D_9_1 KOIUA TRE_po  212
E_0_1 SDFRQ QAS_np  300
E_0_1 SDKIJ JIT_mx  160
F_0_1 JKOPA TOZ_po  79

df2
T_id U_id  U_code  score
A_0_1 UHJKI XPOS_hp 150
B_1_3 NBVFR LKJ_mm  520
C_9_0 TRFDG NBJ_po  90
D_9_1 KOIUA TRE_po  250
E_0_1 SDFRQ QAS_np  300
E_0_1 SDKIJ JIT_mx  160
F_0_1 LOLPO JUZ_ic  90

我想比较 df1 中具有完全相同 T_id, U_id and U_codein df2 的条目的 df1 和 df2 的分数,并根据条件 (df1$score >df2$score, df1$score=df2$score, df$1score<df2score) 将它们分为 3 组,如下所示:

df$1score=df2$score
E_0_1 SDFRQ QAS_np  300
E_0_1 SDKIJ JIT_mx  160
df1$score > df2$score
B_1_3 NBVFR LKJ_mm  543
df1$score < df2$score
A_0_1 UHJKI XPOS_hp 150
D_9_1 KOIUA TRE_po  250 

我还想存储在 df2 中找不到匹配项的 df1 条目

No matches
C_9_0 TRFDA NBV_lp  80
F_0_1 JKOPA TOZ_po  79

我尝试了以下 R 代码

comparison=function(df1,df2)
{
df1_equal_df2=NULL
df1_greater_than_df2=NULL
df1_smaller_than_df2=NULL
no_match=NULL
if(df$T_id==df2$T_id && df1$U_id == df2$U_id && df1$U_code==df2$U_code && df1$score > df2$score)
 {
   df1_greater_than_df2=df$T_id
 }
else if(df$T_id==df2$T_id && df1$U_id == df2$U_id && df1$U_code==df2$U_code && df1$score < df2$score)
 {
   df1_smaller_than_df2=df1$id
 }
else if(df$T_id==df2$T_id && df1$U_id == df2$U_id && df1$U_code==df2$U_code && df1$score = df2$score)
  {
     df1_equal_df2=df$1
  }
else
  {
     no_match=df$1
  }

}

但是上面没有工作。我怎样才能得到我想要的输出。请指导我

【问题讨论】:

  • 为什么不先合并数据呢?然后其他所有操作都很简单。

标签: r dataframe


【解决方案1】:

我们可以使用dplyr

library(dplyr)
res <- df1 %>% left_join(df2, by=c("T_id","U_id","U_code")) %>%
               mutate(comp=ifelse(score.x > score.y,"df1$score > df2$score",ifelse(score.x < score.y,"df1$score < df2$score","df1$score == df2$score"))) %>%
               rename(score=score.x) %>% select(-score.y)
##   T_id  U_id  U_code score                   comp
##1 A_0_1 UHJKI XPOS_hp   134  df1$score < df2$score
##2 B_1_3 NBVFR  LKJ_mm   543  df1$score > df2$score
##3 C_9_0 TRFDA  NBV_lp    80                   <NA>
##4 D_9_1 KOIUA  TRE_po   212  df1$score < df2$score
##5 E_0_1 SDFRQ  QAS_np   300 df1$score == df2$score
##6 E_0_1 SDKIJ  JIT_mx   160 df1$score == df2$score
##7 F_0_1 JKOPA  TOZ_po    79                   <NA>

我们通过T_id, U_id, and U_code 执行df1df2 的左外连接。这将合并两个表,其中来自df1scorescore.x,来自df2scorescore.y。然后使用mutate 创建表示score.x 是否大于、小于或等于score.y 的列comp。最后,我们将score.x 列重命名为score 并删除score.y 列,以使结果更清晰,便于展示。

使用 base-R 的等效实现是:

res <- merge(df1,df2,by=c("T_id","U_id","U_code"), all.x=TRUE)
res$comp <- ifelse(res$score.x > res$score.y,"df1$score > df2$score",ifelse(res$score.x < res$score.y,"df1$score < df2$score","df1$score == df2$score"))
res <- res[,c(1:4,6)]
colnames(res) <- sub("score.x","score",colnames(res))

给出相同的结果。如果你想通过compsplit这个结果数据框:

split(res[,-5],res$comp)
##$`df1$score < df2$score`
##   T_id  U_id  U_code score
##1 A_0_1 UHJKI XPOS_hp   134
##4 D_9_1 KOIUA  TRE_po   212
##
##$`df1$score == df2$score`
##   T_id  U_id U_code score
##5 E_0_1 SDFRQ QAS_np   300
##6 E_0_1 SDKIJ JIT_mx   160
##
##$`df1$score > df2$score`
##   T_id  U_id U_code score
##2 B_1_3 NBVFR LKJ_mm   543

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2021-09-16
    • 2019-07-21
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多