【问题标题】:validation column of two data frame [duplicate]两个数据框的验证列[重复]
【发布时间】:2020-09-22 05:14:00
【问题描述】:

我有两个数据框,df8,df9,现在我正在检查 df8 中的“name”列是否为空白或空格或 NA,然后将新列“blank_name”变异为 1,否则为 0,但我不想更改(CY)中城市的原始列“名称”。我正在尝试以下方式,但对我不起作用。

我还想检查 df9 中是否存在 df8 的 id 然后我想检查名称是否与 df9 中的名称一致

我正在尝试如下,常见的列是 df9[id] 和 df8[code]

df9 <- data.frame(id=c(3109,2357,4339,8927,9143,4285,2683,8217,3702,7857,3255,4262,8501,7111,2681,6970),
                  name=c("try,xab","xab,Lan","mhy,mun","vgtu,mmc","dgsy,aaf","kull,nnhu","hula,njam","mund,jiha","htfy,ntha","","sgyu,hytb","vdti,kula","mftyu,huta","","cday,bhsue","ajtu,nudj"))
                  
df8 <- data.frame(code=c(3109,2357,4339,8927,9143,4285,2683,8217,3702,7857,3255,4262,8501,7111,2681,6970),
                  city = c("CY","NY","DA","CY","MN","GA","MN","CY","NY","DA","CY","CY","GA","CY","LA","DA"),
                  name=c("try,xab","xab,Lan","mhy,mun","vgtu,mmc","   ","kull,nnhu","hula,njam","mund,jiha","htfy,ntha",NA,"sgyu,hytb","vdti,kula","mftyu,huta","","cday,bhsue","ajtu,nudj"))



df8 <- df8 %>%
    mutate(
      name=if_else(city=="CY", name, str_trim(name)),
      blank = case_when(
        is.na(name)~1,
        str_length(name)==0~1,
        TRUE~0
      )
    )
  
  df9 %>%
    rename(name.9=name) %>%
    right_join(df8_update, by=c("id"="code") %>%
    mutate(if_name_ok= case_when(
      is.na(name) & is.na(name.9)~ 0,
      is.na(name) & blank==1 ~0,
      name == name.9 ~0,
      name != name.9~1,
      TRUE ~ NA_real_
    ))  

输出应该是两个变异列,True 和 False 的值为 0 和 1

【问题讨论】:

  • 您为创建blank_nodename 显示的鳕鱼看起来不错...它给您带来了麻烦吗?有什么问题?
  • 刚刚更新了函数,但“名称”工作正常,但我在 if_name_ok 中有问题
  • 您在寻找df9 %&gt;% rename(name.9=name) %&gt;% right_join(df8, by=c("id"="code")) %&gt;%mutate(if_name_ok= case_when(is.na(name) &amp; is.na(name.9)~ 0,is.na(name) &amp; blank==1 ~0,name == name.9 ~0,name != name.9~1,TRUE ~ NA_real_)) 吗?
  • 是的,它有效,但对于 NA 值,它应该为 if_name_ok 显示 1

标签: r


【解决方案1】:

您可以尝试以下方法:

library(dplyr)

df9 %>%  
  rename(name.9=name) %>% 
  right_join(df8, by=c("id"="code")) %>%
  mutate(if_name_ok= case_when((is.na(name) & is.na(name.9)) | 
                               (is.na(name) & blank==1) | 
                               (name == name.9) ~ 0,
                               name != name.9~1,
                               TRUE ~ 1))

【讨论】:

  • 但如果我不想要 df8 中的任何列,我只想要 if_name_ok 空白列
  • 在此之后,您可以select 要保留的列。 %&gt;% select(if_name_ok, blank).
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2019-05-02
  • 2022-08-11
  • 1970-01-01
  • 2020-11-13
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多