【问题标题】:Joining two data frame together with common columns将两个数据框与公共列连接在一起
【发布时间】:2021-10-22 22:27:42
【问题描述】:

我有两个不同的数据框共享公共列,如下所示:

数据框 A

structure(list(Firm = c("Alex", NA, NA), Postal.Code = c("V0N 1B4", 
"V0N 1B4", "V0N 1B4"), sold.month = c(NA_real_, NA_real_, NA_real_
), sold.year = c(NA_real_, NA_real_, NA_real_), sold.qtr = c(NA_real_, 
NA_real_, NA_real_), List.year = c(2018, 2018, 2018), List.Date.Year.quarter = c("2018 Q2", 
"2018 Q2", "2018 Q2")), row.names = c(NA, 3L), class = "data.frame")

  Firm Postal.Code sold.month sold.year sold.qtr List.year List.Date.Year.quarter
1 Alex     V0N 1B4         NA        NA       NA      2018                2018 Q2
2 <NA>     V0N 1B4         NA        NA       NA      2018                2018 Q2
3 <NA>     V0N 1B4         NA        NA       NA      2018                2018 Q2

数据框 B

structure(list(sold.month = c(NA, 1L, 1L), sold.year = c(NA, 
2020L, 2020L), sold.qtr = c(NA, 1L, 1L), List.Date.Year.quarter = structure(c(2019.75, 
2019.75, 2019.75), class = "yearqtr"), List.Date.Year.month = structure(c(2019.75, 
2019.91666666667, 2019.91666666667), class = "yearmon"), Sold.Date.Year.month = structure(c(NA, 
2020, 2020), class = "yearmon")), row.names = c(NA, 3L), class = "data.frame")

  sold.month sold.year sold.qtr List.Date.Year.quarter List.Date.Year.month Sold.Date.Year.month
1         NA        NA       NA                2019 Q4             Oct 2019                 <NA>
2          1      2020        1                2019 Q4             Dec 2019             Jan 2020
3          1      2020        1                2019 Q4             Dec 2019             Jan 2020

我想做的事: 我只想连接数据框 A 中存在的数据框 B 的列。如果数据框 A 中的列在 B 中不存在,则该值应显示为 NA。

预期的结果应该是:

 Firm Postal.Code sold.month sold.year sold.qtr List.year List.Date.Year.quarter
1 Alex     V0N 1B4         NA        NA       NA      2018                2018 Q2
2 <NA>     V0N 1B4         NA        NA       NA      2018                2018 Q2
3 <NA>     V0N 1B4         NA        NA       NA      2018                2018 Q2
4  NA         NA           NA        NA       NA       NA                 2019 Q4
5  NA         NA            1       2020       1       NA                 2019 Q4
6  NA         NA            1       2020       1       NA                 2019 Q4

【问题讨论】:

    标签: r dataframe join merge


    【解决方案1】:

    您可以将列的类型更改为字符,然后使用bind_rows 组合它们。

    library(dplyr)
    
    A %>% 
      mutate(across(.fns = as.character)) %>%
      bind_rows(B %>%
                select(intersect(names(.), names(A))) %>%
                mutate(across(.fns = as.character))) %>%
      type.convert(as.is = TRUE)
    
    #  Firm Postal.Code sold.month sold.year sold.qtr List.year List.Date.Year.quarter
    #1 Alex     V0N 1B4         NA        NA       NA      2018                2018 Q2
    #2 <NA>     V0N 1B4         NA        NA       NA      2018                2018 Q2
    #3 <NA>     V0N 1B4         NA        NA       NA      2018                2018 Q2
    #4 <NA>        <NA>         NA        NA       NA        NA                2019 Q4
    #5 <NA>        <NA>          1      2020        1        NA                2019 Q4
    #6 <NA>        <NA>          1      2020        1        NA                2019 Q4
    

    【讨论】:

    • 感谢罗纳克沙阿。尽管我正在运行您编写的确切代码,但我收到此错误消息:Error: Can't combine ..1$sold.month` ..2$sold.month .` 不确定为什么会发生这种情况
    • 有趣!当我将代码中的 mutate 更改为 dplyr::mutate 时,它对我有用!!!我很困惑,因为我已经加载了dplyr。但是谢谢它仍然有效
    • 这可能是因为您还加载了plyr,它掩盖了mutate 函数。
    猜你喜欢
    • 1970-01-01
    • 2021-08-10
    • 2019-04-17
    • 2017-11-17
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2023-04-05
    • 1970-01-01
    相关资源
    最近更新 更多