使用 R 对两个数据帧进行多项操作答案

【问题标题】：multiple operations on two dataframes using R使用 R 对两个数据帧进行多项操作
【发布时间】：2020-08-20 17:44:21
【问题描述】：

我有两个不同长度的数据框 df1 和 df2 以及两列作为关键列。我想对这些数据帧执行多个操作，如下所示：

根据键列将 df1 中的空白 (NA) 单元格替换为 df2 中的相应值
对于每个键列对，两个数据帧中值相互矛盾的单元格应在新数据帧中报告

df1

id_col1   id_col2   name    age    sex
---------------------------------------
101         1M              21  
101         3M              21      M
102         1M      Mark    25

df2

id_col1    id_col2    name     age     sex
-------------------------------------------
101          1M       Steve             M
101          2M                         M
101          3M       Steve    25   
102          1M       Ria      25       M
102          2M       Anie     22       F

执行操作 1 后，即将 df1 中的 NA 替换为 df2 中的相应值，我应该得到以下结果：

结果_1

id_col1    id_col2    name     age     sex
-------------------------------------------
101         1M        Steve    21      M
101         3M        Steve    25      M
102         1M        Mark     25      M

执行操作 2 后，即 df1 和 df2 中相同键列的单元格冲突，我应该得到以下信息：

结果_2

id_col1    id_col2    name     age     sex
-------------------------------------------
101          3M                21   
101          3M                25   
102          1M        Mark     
102          1M        Ria

谁能帮忙解决这些问题？

【问题讨论】：

试试left_join(df1, select(df2, -age), by = c('id_col1', 'id_col2')) %>% mutate(name = coalesce(name.x, name.y), sex = coalesce(sex.x, sex.y))) %>% select(names(df1))
result2 可能是anti_join

标签： r dataframe

【解决方案1】：

对于result_1，您可以在整形为长格式后使用left_join 和case_when：

library(dplyr)
library(tidyr)

left_join(pivot_longer(df1, -starts_with('id_col'),  
                            values_ptypes=list(value='character')),
          pivot_longer(df2, -starts_with('id_col'), values_to="value2", 
                            values_ptypes=list(value2='character'))) %>%
  mutate(value = case_when(value == '' ~ value2,
                            TRUE ~ value)) %>%
  select(-value2) %>%
  pivot_wider() %>%
  type.convert()

#   id_col1 id_col2 name    age sex  
#     <int> <fct>   <fct> <int> <fct>
# 1     101 1M      Steve    21 M    
# 2     101 3M      Steve    21 M    
# 3     102 1M      Mark     25 M

对于 result_2，代码类似，只是我们过滤并添加了一个额外的 reshape，而不是 mutating。

left_join(pivot_longer(df1, -starts_with('id_'), values_to="value1", 
                            values_ptypes=list(value1='character')),
          pivot_longer(df2, -starts_with('id_'), values_to="value2", 
                            values_ptypes=list(value2='character'))) %>%
  filter(value1 != '' & value2 != '' & value1 != value2) %>%
  pivot_longer(cols=value1:value2, names_prefix="value", names_to="df") %>%
  pivot_wider() %>%
  type.convert() %>% 
  select(intersect(names(df1), names(.))) # to retain original colname ordering

#   id_col1 id_col2 name    age
#     <int> <fct>   <fct> <int>
# 1     101 3M      NA       21
# 2     101 3M      NA       25
# 3     102 1M      Mark     NA
# 4     102 1M      Ria      NA

数据：

df1 <- structure(list(id_col1 = c(101L, 101L, 102L), id_col2 = c("1M", 
"3M", "1M"), name = c("", "", "Mark"), age = c(21L, 21L, 25L), 
    sex = c("", "M", "")), class = "data.frame", row.names = c(NA, -3L))

df2 <- structure(list(id_col1 = c(101L, 101L, 101L, 102L, 102L), id_col2 = c("1M", 
"2M", "3M", "1M", "2M"), name = c("Steve", "", "Steve", "Ria", 
"Anie"), age = c(NA, NA, 25L, 25L, 22L), sex = c("M", "M", "", 
"M", "F")), class = "data.frame", row.names = c(NA, -5L))

【讨论】：

如果我的列数超过 10 或 20 并且我不知道每列的数据类型怎么办？在这种情况下，您的解决方案实施起来会有点复杂。
好点，虽然你没有在问题中提到这一点。 ;) 查看我的更新答案。