【问题标题】:Avoiding duplicates while Merging two dataframes in R在 R 中合并两个数据帧时避免重复
【发布时间】:2021-10-05 13:14:17
【问题描述】:

我有两个具有相同 ID 的数据框:

DF 1

ID <- c(20, 21, 22, 23, 24, 25)
Town <- c("Nairobi", "Kisumu", "Mombasa", "Nairobi", "Mombasa", "Nairobi")
Name <- c("John", "Joseph", "Agnes","Steph","Brian","Jayden")
Customer <- data.frame(ID, Town, Name)

DF 2

ID <- c(20, 20,22, 22, 23, 25, 24, 20)
Town <- c("Nairobi", "Nairobi", "Mombasa", "Mombasa", "Nairobi", "Nairobi","Mombasa", "Nairobi")
Amount <- c(100, 300, 500, 400, 300, 1000, 300, 170)
TownSales <- data.frame(ID, Town, Amount)

我希望合并后的最终 Dataframe 如下所示,DF1 中的 Name 列中没有重复项,因为它的行数少于 DF2。我正在避免重复,因为我拥有的数据框行数较少,包含我想要计算的连续变量。

【问题讨论】:

    标签: r join dplyr


    【解决方案1】:

    我认为问题不在于合并时删除重复项,而在于合并后如何调整表格。让我解释一下,请在下面找到一个可重现的示例来回答您的问题。

    我正在使用对这类问题非常有效的 data.table 包。

    # Load library
    library(data.table)
    
    # Initialize dataframes
    ID <- c(20, 20,22, 22, 23, 25, 24, 20)
    Town <- c("Nairobi", "Nairobi", "Mombasa", "Mombasa", "Nairobi", "Nairobi","Mombasa", "Nairobi")
    Amount <- c(100, 300, 500, 400, 300, 1000, 300, 170)
    TownSales <- data.frame(ID, Town, Amount)
    
    ID <- c(20, 21, 22, 23, 24, 25)
    Town <- c("Nairobi", "Kisumu", "Mombasa", "Nairobi", "Mombasa", "Nairobi")
    Name <- c("John", "Joseph", "Agnes","Steph","Brian","Jayden")
    Customer <- data.frame(ID, Town, Name)
    
    # Perform the merge
    results_dt <- merge(Customer,TownSales)
    

    经过这些步骤,我们最终得到下表:

    ID Town Name Amount
    20 Nairobi John 100
    20 Nairobi John 300
    20 Nairobi John 170
    22 Mombasa Agnes 500
    22 Mombasa Agnes 400
    23 Nairobi Steph 300
    24 Mombasa Brian 300
    25 Nairobi Jayden 1000

    然后我们只需要按照您的预期调整重复数据,即对于所有按 ID、城镇和名称的重复数据,我们将“名称”列设置为 NA。 以下行确实是您正在寻找的内容:

    # Adjust table for duplicated rows
    results_dt[duplicated(results_dt, by = c("ID","Town","Name")),Name:=NA]
    

    最后,结果表如下所示:

    ID Town Name Amount
    20 Nairobi John 100
    20 Nairobi 300
    20 Nairobi 170
    22 Mombasa Agnes 500
    22 Mombasa 400
    23 Nairobi Steph 300
    24 Mombasa Brian 300
    25 Nairobi Jayden 1000

    如果需要,您可以随时在此之后重新排列顺序。

    【讨论】:

      猜你喜欢
      • 2020-09-30
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-04-14
      • 1970-01-01
      • 2021-11-20
      • 2015-08-12
      • 1970-01-01
      相关资源
      最近更新 更多