R：使用条件根据数据框 Y 中的值替换数据框 X 中的值答案

【问题标题】：R: replace values in dataframe X based on values from dataframe Y, using a conditionR：使用条件根据数据框 Y 中的值替换数据框 X 中的值
【发布时间】：2021-01-29 06:00:05
【问题描述】：

我在 R 中有一个相对简单的问题，我似乎无法弄清楚。我有两个具有完全相同尺寸的数据框：

#dataframe 1
df1 = data.frame("contrast1" = c('2.3', '6.5', '0.6', '-0.8', '2.3', '2.4', '-7.1'), 
                 "contrast2" = c('1.0','0.9','0.8','2.3','4.3','8.7','0.4'),
                 "contrast3" = c('-0.2','-0.1','-1.2','-2.3','-0.3','-0.4','-0.1'))
row.names(df1) = c('gene1','gene2','gene3','gene4','gene5','gene6','gene7')

#dataframe 2
df2 = data.frame("contrast1" = c('1', '1', '0', '0', '1', '1', '1'), 
                 "contrast2" = c('1','0','0','1','1','1','0'),
                 "contrast3" = c('0','0','1','1','0','0','0'))
row.names(df2) = c('gene1','gene2','gene3','gene4','gene5','gene6','gene7')

数据框如下所示：

>df1
      contrast1 contrast2 contrast3
gene1       2.3       1.0      -0.2
gene2       6.5       0.9      -0.1
gene3       0.6       0.8      -1.2
gene4      -0.8       2.3      -2.3
gene5       2.3       4.3      -0.3
gene6       2.4       8.7      -0.4
gene7      -7.1       0.4      -0.1

>df2
      contrast1 contrast2 contrast3
gene1         1         1         0
gene2         1         0         0
gene3         0         0         1
gene4         0         1         1
gene5         1         1         0
gene6         1         1         0
gene7         1         0         0

现在，我想根据 df2 中的值替换 df1 中的某些值。具体来说，每当 df2 中的值为零时，我希望将 df1 的每一列中的对应字段替换为“NA”。

因此，生成的数据框应如下所示：

>df_output
      contrast1 contrast2 contrast3
gene1       2.3       1.0        NA
gene2       6.5        NA        NA
gene3        NA        NA      -1.2
gene4        NA       2.3      -2.3
gene5       2.3       4.3        NA
gene6       2.4       8.7        NA
gene7      -7.1        NA        NA

确实有很多类似的问题——例如here、here 和here——但它们似乎都没有满足我的要求。我尝试了几件没有成功的事情。下面使用 dplyr 的代码是我得到的最接近的代码 - 但它向我抛出错误消息，并且还没有考虑到我实际上想用“NA”而不是零替换值。

library(dplyr)    
df2 %>%
      left_join(df1, by = rownames(df1)) %>%
      mutate(Count = ifelse(is.zero(Count.x), Count.y, Count.x)) %>%
      select(-c(Count.x, Count.y))

【问题讨论】：

标签： r dataframe dplyr conditional-statements

【解决方案1】：

在base R 中，使用type.convert 将'df1' 列的type 从character 转换为numeric，然后只进行算术运算以将值更改为NA

df1 <- type.convert(df1, as.is = TRUE)
(NA^(df2 == 0)) * df1

-输出

#        contrast1 contrast2 contrast3
#gene1       2.3       1.0        NA
#gene2       6.5        NA        NA
#gene3        NA        NA      -1.2
#gene4        NA       2.3      -2.3
#gene5       2.3       4.3        NA
#gene6       2.4       8.7        NA
#gene7      -7.1        NA        NA

使用tidyverse，我们可以使用map2

library(purrr)
map2_dfc(df1, df2, ~ replace(.x, .y == 0, NA))

【讨论】：

哇哦！我真的走得太远了。非常感谢这个解决方案！

【解决方案2】：

您可以将给定列中的每个值 0 更改为 NA，然后将其迭代到循环中的每一列。不是最干净的方法，但它有效：

for(i in 1:length(names(df1))){
df1[which(df2[,i]==0),i]<-NA
}

df1
      contrast1 contrast2 contrast3
gene1       2.3       1.0      <NA>
gene2       6.5      <NA>      <NA>
gene3      <NA>      <NA>      -1.2
gene4      <NA>       2.3      -2.3
gene5       2.3       4.3      <NA>
gene6       2.4       8.7      <NA>
gene7      -7.1      <NA>      <NA>

【讨论】：