【问题标题】:Merging in R based on conditions根据条件在 R 中合并
【发布时间】:2018-05-01 13:31:40
【问题描述】:

对于两个示例数据框:

df1 <- structure(list(name = c("Katie", "Eve", "James", "Alexander", 
"Mary", "Barrie", "Harry", "Sam"), postcode = c("CB12FR", "CB12FR", 
"NE34TR", "DH34RL", "PE46YH", "IL57DS", "IP43WR", "IL45TR")), .Names = c("name", 
"postcode"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-8L), spec = structure(list(cols = structure(list(name = structure(list(), class = c("collector_character", 
"collector")), postcode = structure(list(), class = c("collector_character", 
"collector"))), .Names = c("name", "postcode")), default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"))

df2 <-structure(list(name = c("Katie", "James", "Alexander", "Lucie", 
"Mary", "Barrie", "Claire", "Harry", "Clare", "Hannah", "Rob", 
"Eve", "Sarah"), postcode = c("CB12FR", "NE34TR", "DH34RL", "DL56TH", 
"PE46YH", "IL57DS", "RE35TP", "IP43WQ", "BH35OP", "CB12FR", "DL56TH", 
"CB12FR", "IL45TR"), rating = c(1L, 1L, 1L, 2L, 3L, 1L, 4L, 2L, 
2L, 3L, 1L, 4L, 2L)), .Names = c("name", "postcode", "rating"
), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-13L), spec = structure(list(cols = structure(list(name = structure(list(), class = c("collector_character", 
"collector")), postcode = structure(list(), class = c("collector_character", 
"collector")), rating = structure(list(), class = c("collector_integer", 
"collector"))), .Names = c("name", "postcode", "rating")), default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"))

我希望合并这两个数据框,因此将 df2 上的评级添加到 df1。我通常会使用:

ratings.df

但是.... 我希望仅在以下情况下合并: 1. df2 中的邮政编码是唯一的(即,如果每个名称(或不同的名称)有多个邮政编码,则不会合并)。 2. 并且名称的前三个字母在两个数据框中都相同。

(我很高兴为没有评级的邮政编码留出空白(然后我可以手动执行这些操作)。

这可能吗?

【问题讨论】:

    标签: r


    【解决方案1】:

    为什么不使用sqldf 包?您可以使用此包在 R 中合并 data.frames。使用JOIN 语句来做到这一点。

    就条件合并而言,这可以通过在SQL中使用CASE语句来完成。

    因此,对于您的第一个条件,您可以使用CASE,其中COUNT(postcode) = ‘1’ 和您GROUP BY name,这样对于每个分配有1 个邮政编码的名称,您可以JOIN

    另一种选择是gather 使用tidyr

    【讨论】:

      【解决方案2】:

      使用dplyr 解决方案,我们可以首先消除df2$postcode 中的重复项,然后将数据框加入df1

      library(dplyr)
      df3 <- df2 %>%
        distinct(postcode, .keep_all = TRUE)
      
      df1 %>%
        left_join(df3, by = c("postcode")) %>%
        filter(substr(name.x, 1, 3) == substr(name.y, 1, 3)) %>%
        rename(name = name.x) %>%
        mutate(name.y = NULL)
      


      这将产生
      # A tibble: 5 x 3
        name      postcode rating
        <chr>     <chr>     <int>
      1 Katie     CB12FR        1
      2 James     NE34TR        1
      3 Alexander DH34RL        1
      4 Mary      PE46YH        3
      5 Barrie    IL57DS        1
      

      这是你想要达到的目标吗?

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2021-11-28
        • 1970-01-01
        • 1970-01-01
        • 2015-07-19
        • 2017-12-30
        • 2018-11-25
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多