我认为问题不在于合并时删除重复项,而在于合并后如何调整表格。让我解释一下,请在下面找到一个可重现的示例来回答您的问题。
我正在使用对这类问题非常有效的 data.table 包。
# Load library
library(data.table)
# Initialize dataframes
ID <- c(20, 20,22, 22, 23, 25, 24, 20)
Town <- c("Nairobi", "Nairobi", "Mombasa", "Mombasa", "Nairobi", "Nairobi","Mombasa", "Nairobi")
Amount <- c(100, 300, 500, 400, 300, 1000, 300, 170)
TownSales <- data.frame(ID, Town, Amount)
ID <- c(20, 21, 22, 23, 24, 25)
Town <- c("Nairobi", "Kisumu", "Mombasa", "Nairobi", "Mombasa", "Nairobi")
Name <- c("John", "Joseph", "Agnes","Steph","Brian","Jayden")
Customer <- data.frame(ID, Town, Name)
# Perform the merge
results_dt <- merge(Customer,TownSales)
经过这些步骤,我们最终得到下表:
| ID |
Town |
Name |
Amount |
| 20 |
Nairobi |
John |
100 |
| 20 |
Nairobi |
John |
300 |
| 20 |
Nairobi |
John |
170 |
| 22 |
Mombasa |
Agnes |
500 |
| 22 |
Mombasa |
Agnes |
400 |
| 23 |
Nairobi |
Steph |
300 |
| 24 |
Mombasa |
Brian |
300 |
| 25 |
Nairobi |
Jayden |
1000 |
然后我们只需要按照您的预期调整重复数据,即对于所有按 ID、城镇和名称的重复数据,我们将“名称”列设置为 NA。
以下行确实是您正在寻找的内容:
# Adjust table for duplicated rows
results_dt[duplicated(results_dt, by = c("ID","Town","Name")),Name:=NA]
最后,结果表如下所示:
| ID |
Town |
Name |
Amount |
| 20 |
Nairobi |
John |
100 |
| 20 |
Nairobi |
|
300 |
| 20 |
Nairobi |
|
170 |
| 22 |
Mombasa |
Agnes |
500 |
| 22 |
Mombasa |
|
400 |
| 23 |
Nairobi |
Steph |
300 |
| 24 |
Mombasa |
Brian |
300 |
| 25 |
Nairobi |
Jayden |
1000 |
如果需要,您可以随时在此之后重新排列顺序。