如何获取一个r文件中的数据并用它来替换另一个文件中的部分数据答案

【问题标题】：How to take data in one r file and use it to replace part of the data in a different one如何获取一个r文件中的数据并用它来替换另一个文件中的部分数据
【发布时间】：2020-06-20 10:21:50
【问题描述】：

我在 r 中有两个数据集，一个有一个包含我想添加到另一个的信息的列，该列当前填充了 NA。

给你一个想法，这就是两个数据集的样子：

数据集 1：

id || location || city name
1 || 54.234 || name1
2 || NA || name2
3 || NA || name3
4 || 55.2345 || name4

数据集 2：

id || location || city name
2 || 57.234 || name2
3 || 58.234 || name3

我想得到以下结果：

id || location || city name
1 || 54.234 || name1
2 || 57.234 || name2
3 || 58.234 || name3
4 || 55.2345 || name4

现在，我正在使用这个：

dataSetFinal <- rbind(dataSet1, dataSet2, by="id")

但这会复制具有共同ID的行，并在副本中添加相应的位置。我怎样才能得到想要的结果？谢谢。

【问题讨论】：

标签： r merge duplicates rbind

【解决方案1】：

我们可以通过id和city.name连接两个数据集，然后使用coalesce从location中选择非NA列。

library(dplyr)

left_join(df1, df2, by = c('id', 'city.name')) %>%
  mutate(location = coalesce(location.x, location.y)) %>%
  select(names(df1))

#  id location city.name
#1  1  54.2340     name1
#2  2  57.2340     name2
#3  3  58.2340     name3
#4  4  55.2345     name4

或者在基础 R 中：

transform(merge(df1, df2, by = c('id', 'city.name'), all.x = TRUE), 
      location = ifelse(is.na(location.x), location.y, location.x))[names(df1)]

数据

df1 <- structure(list(id = 1:4, location = c(54.234, NA, NA, 55.2345
), city.name = structure(1:4, .Label = c("name1", "name2", "name3", 
"name4"), class = "factor")), class = "data.frame", row.names = c(NA, -4L))

df2 <- structure(list(id = 2:3, location = c(57.234, 58.234), 
city.name = structure(1:2, .Label = c("name2", "name3"), class = "factor")), 
class = "data.frame", row.names = c(NA, -2L))

【讨论】：

我如何确保获得具有相同初始列名的数据集，并且不会以 .y .x 方式复制它们？
@VictorDanielCardenas 在这两种方法中，我都确保您通过使用names(df1) 对列进行子集化来获取包含初始列的数据集。