【问题标题】:"merge" in R replaces real values with NAsR中的“合并”用NA替换实际值
【发布时间】:2018-12-06 00:44:33
【问题描述】:

我有两张桌子

  • data_id
  • data_id1

我需要合并它们。大多数列是相同的,因此只有名为“FLatSchool”的列中的信息实际上会从 data_id1 添加到 data_id:

all_data = merge (data_id, data_id_1, by=c("ID_w2", "ID_w3", "ID_w4", "ID_w5",
                                        "ID_MC_w3", "ID_MC_w5",
                                        "School", "Class"),
              all.x = T )

在合并之前,我检查了“data_id1”,发现“FLatSchool”列中有不同的(数字)值。但是,当两个表合并时,结果表中的这一列仅包含 NA(其他列没有问题,只有这一列)。可能是什么问题?

数据:

> dput(data_id)
structure(list(School = c(3L, 3L, 3L), Class = c(10L, 10L, 10L
), ID_w2 = structure(1:3, .Label = c("RU8_800", "RU8_801", "RU8_802"
), class = "factor"), ID_all = 71163901:71163903, ID_w3 = 427748:427750, 
ID_MC_w3 = structure(1:3, .Label = c("stp94660", "stp94661", 
"stp94662"), class = "factor"), ID_w4 = 428617:428619, ID_w5 = 
428725:428727, 
ID_MC_w5 = structure(1:3, .Label = c("STP114890", "STP114891", 
"STP114892"), class = "factor")), .Names = c("School", "Class", 
"ID_w2", "ID_all", "ID_w3", "ID_MC_w3", "ID_w4", "ID_w5", "ID_MC_w5"
), row.names = c(NA, 3L), class = "data.frame")


> dput(data_id_1)
structure(list(ID_w2 = structure(c(NA, 2L, 1L), .Label = c("RU8_235", 
"RU8_239"), class = "factor"), ID_w3 = 427521:427523, ID_MC_w3 = 
structure(1:3, .Label = c("stp94433", 
"stp94434", "stp94435"), class = "factor"), ID_w4 = 428390:428392, 
ID_w5 = 428781:428783, ID_MC_w5 = structure(1:3, .Label = c("stp114946", 
"stp114947", "stp114948"), class = "factor"), School = c(1L, 
1L, 1L), Class = c(5L, 5L, 5L), FLatSchool = c(1L, 1L, 1L
)), .Names = c("ID_w2", "ID_w3", "ID_MC_w3", "ID_w4", "ID_w5", 
"ID_MC_w5", "School", "Class", "FLatSchool"), row.names = c(NA, 
3L), class = "data.frame")

使用上面的脚本后我得到的是

> dput(all_data)
structure(list(ID_w2 = structure(1:3, .Label = c("RU8_800", "RU8_801", 
"RU8_802"), class = "factor"), ID_w3 = 427748:427750, ID_w4 = 
428617:428619, 
ID_w5 = 428725:428727, ID_MC_w3 = structure(1:3, .Label = c("stp94660", 
"stp94661", "stp94662"), class = "factor"), ID_MC_w5 = structure(1:3, 
.Label = c("STP114890", 
"STP114891", "STP114892"), class = "factor"), School = c(3L, 
3L, 3L), Class = c(10L, 10L, 10L), ID_all = 71163901:71163903, 
FLatSchool = c(NA_integer_, NA_integer_, NA_integer_)), .Names = 
c("ID_w2", 
"ID_w3", "ID_w4", "ID_w5", "ID_MC_w3", "ID_MC_w5", "School", 
"Class", "ID_all", "FLatSchool"), row.names = c(NA, -3L), class = 
"data.frame")

我期待的是

> dput(all_data)
structure(list(ID_w2 = structure(1:3, .Label = c("RU8_800", "RU8_801", 
"RU8_802"), class = "factor"), ID_w3 = 427748:427750, ID_w4 = 
428617:428619, 
ID_w5 = 428725:428727, ID_MC_w3 = structure(1:3, .Label = c("stp94660", 
"stp94661", "stp94662"), class = "factor"), ID_MC_w5 = structure(1:3, 
.Label = c("STP114890", 
"STP114891", "STP114892"), class = "factor"), School = c(3L, 
3L, 3L), Class = c(10L, 10L, 10L), ID_all = 71163901:71163903, 
FLatSchool = c(1, 1, 1)), .Names = c("ID_w2", 
"ID_w3", "ID_w4", "ID_w5", "ID_MC_w3", "ID_MC_w5", "School", 
"Class", "ID_all", "FLatSchool"), row.names = c(NA, -3L), class = 
"data.frame")

【问题讨论】:

  • 你能给我们一个可重现的 data_id 和 data_id1 数据吗?另外我怀疑您的“by”论点可能存在问题,但需要可重复的数据才能得出结论
  • 请添加可重现的示例以及预期的输出。
  • 对不起,我真的不知道如何给你数据。我应该把它们放在某个主机上并提供一个链接吗?
  • 您提供的数据,没有共同的行,例如列ID_w2:在data_id_1它有值<NA> , RU8_239, RU8_235,在data_id它有@987654329 @
  • 如果你想要两个表中的行或者在合并中使用参数all = TRUE(而不是all.x = TRUE),或者dplyr函数full_join()

标签: r merge


【解决方案1】:

感谢大家的回复!问题解决了,我只需要把“all = TRUE”放到合并公式中:

all_data = merge (data_id, data_id_1, by=c("ID_w2", "ID_w3", "ID_w4", "ID_w5",
                                        "ID_MC_w3", "ID_MC_w5",
                                        "School", "Class"),
              all.x = T , all = T)

【讨论】:

    【解决方案2】:

    您要添加观察结果吗?我还建议使用库(dplyr)。

    left_join(x, y, by = c("a" = "b"), copy = FALSE, suffix = c(".x", ".y"), ...) 
    

    确保FLatSchool 位于“左侧”以保留这些值。 在 by 语句中,您可以有多个变量。

    后缀 = 如果 x 和 y 中存在未连接的重复变量,则会将这些后缀添加到输出中以消除它们的歧义。应该是长度为 2 的字符向量。

    看看这个网站: https://rpubs.com/williamsurles/293454

    【讨论】:

    • 谢谢!我知道 dplyr,但我想通过使用合并功能来解决这个问题。其实我发现了问题所在:合并时我必须将“all = TRUE”放入公式中......
    猜你喜欢
    • 2016-03-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-10-28
    • 1970-01-01
    • 1970-01-01
    • 2020-10-26
    • 1970-01-01
    相关资源
    最近更新 更多