【发布时间】:2019-10-11 06:26:03
【问题描述】:
我有一个数据集,其中包含一些我想用 NA 替换的未引用数据。在下面的示例中,如果列 rep1 到 rep4 中的数据与 ID 列中的值之一不匹配,我想用 NA 替换该值。在这种情况下,x、y 和 z 的值未列在 ID 列中,因此应替换它们。
这是我之前在这里问过的一个有点相似的问题:If data present, replace with data from another column based on row ID
我认为解决方案将类似于上一个问题中给出的解决方案,但我不知道如何更改第二部分 ~ value[match(., ID)] 以针对 ID 列中未列出的值返回 NA。
df %>% mutate_at(vars(rep1:rep4), ~ value[match(., ID)])
ID rep1 rep2 rep3 rep4
a
b a
c a b
d a b c
e a b c d
f
g x
h
i
j y z
k z
l
m
结果应该是这样的:
ID rep1 rep2 rep3 rep4
a
b a
c a b
d a b c
e a b c d
f
g NA
h
i
j NA NA
k NA
l
m
这是使用dput()的数据
structure(list(ID = structure(1:13, .Label = c("a", "b", "c",
"d", "e", "f", "g", "h", "i", "j", "k", "l", "m"), class = "factor"),
rep1 = structure(c(1L, 2L, 2L, 2L, 2L, 1L, 3L, 1L, 1L, 4L,
5L, 1L, 1L), .Label = c("", "a", "x", "y", "z"), class = "factor"),
rep2 = structure(c(1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 3L,
1L, 1L, 1L), .Label = c("", "b", "z"), class = "factor"),
rep3 = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = c("", "c"), class = "factor"), rep4 = structure(c(1L,
1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("",
"d"), class = "factor")), class = "data.frame", row.names = c(NA, -13L))
【问题讨论】: