【问题标题】:Merge dataframes in R with different size and condition在 R 中合并具有不同大小和条件的数据框
【发布时间】:2020-10-07 05:53:31
【问题描述】:

我正在尝试将 2 个 csv 文件合并为一个文件。他们有不同大小的共同ID。我使用了 merge() 但我得到了复制的数据。我有以下数据框;

SR <- c("SR1", "SR2", "SR2", "SR2", "SR3", "SR4", "SR4")
school <- c("S-1", "S-1", "S-2", "S-4", "S-2", "S-1", "S-5")
Y <- c(3,4,1,2,5,2,3)
data1 <- data.frame(SR.id, school, Y)


SR <- c("SR1", "SR1", "SR1", "SR2", "SR2", "SR2", "SR2", "SR2", "SR2", "SR2", "SR3", "SR3", "SR4", "SR4", "SR4")
class <- c("S-1.02", "S-1.05", "S-1.07", "S-1.01", "S-1.02", "S-1.03", "S-1.06", "S-2.03", "S-2.15", "S-4.02", "S-2.01", "S-2.03", "S-1.05", "S-1.06", "S-5.01")
data2 <- data.frame(SR, class)
data1
  SR     school     Y
  SR1     S-1       3
  SR2     S-1       4
  SR2     S-2       1
  SR2     S-4       2
  SR3     S-2       5
  SR4     S-1       2
  SR4     S-5       3

data2
  SR      class
  SR1     S-1.02 
  SR1     S-1.05
  SR1     S-1.07
  SR2     S-1.01
  SR2     S-1.02
  SR2     S-1.03
  SR2     S-1.06
  SR2     S-2.03
  SR2     S-2.15
  SR2     S-4.02
  SR3     S-2.01
  SR3     S-2.03
  SR4     S-1.05
  SR4     C-1.06
  SR4     C-5.01

学校在哪里,结果应该是这样的

  SR      school     class      Y
  SR1      S-1       S-1.02     3
  SR1      S-1       S-1.05     3
  SR1      S-1       S-1.07     3
  SR2      S-1       S-1.01     4
  SR2      S-1       S-1.02     4
  SR2      S-1       S-1.03     4
  SR2      S-1       S-1.06     4
  SR2      S-2       S-2.03     1
  SR2      S-2       S-2.15     1
  SR2      S-4       S-4.02     2
  SR3      S-2       S-2.01     5
  SR3      S-2       S-2.03     5
  SR4      S-1       S-1.05     2
  SR4      S-1       S-1.06     2
  SR4      S-5       S-5.01     3

感谢您的帮助。

【问题讨论】:

    标签: r


    【解决方案1】:

    一个选项是regex_left_join 来自fuzzyjoin

    library(fuzzyjoin)
    library(dplyr)
    regex_left_join(data2, data1, by = c("SR", "class" = "school")) %>%
          select(SR = SR.x, school, class, Y)
    
    #    SR    school   class    Y
    # 1  SR1    S-1     S-1.2    3
    # 2  SR1    S-1     S-1.5    3
    # 3  SR1    S-1     S-1.7    3
    # 4  SR2    S-1     S-1.1    4
    # 5  SR2    S-1     S-1.2    4
    # 6  SR2    S-1     S-1.3    4
    # 7  SR2    S-1     S-1.6    4
    # 8  SR2    S-2     S-2.3    1
    # 9  SR2    S-2     S-2.9    1
    # 10 SR2    S-4     S-4.2    2
    # 11 SR3    S-2     S-2.1    5
    # 12 SR3    S-2     S-2.3    5
    # 13 SR4    S-1     S-1.5    2
    # 14 SR4    S-1     S-1.6    2
    # 15 SR4    S-5     S-5.1    3
    

    【讨论】:

    • 非常感谢,我会在大数据上尝试一下。
    • @Me28 确保每个数据集中的两个by 变量属于同一类
    【解决方案2】:

    您能否编辑您的问题并使用 dput 将您的两个 df 放入一个更易于我们抓取的表格中?

    话虽如此,你需要做一些类似的事情

    # NOT RUN
    library(tidyverse)
    RESULT <- data2 %>%
      mutate(comparison.id = str_detect(outcome.id, "^.+\\d+")) %>%
      inner_join(data1, by = c("SR.id", "comparison.id"))
    

    【讨论】:

    • 我编辑了问题,你现在可以使用了,谢谢
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2023-02-04
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多