【问题标题】:Combining gsub() and using variable names as columns in R [duplicate]结合gsub()并使用变量名作为R中的列[重复]
【发布时间】:2020-08-18 22:13:12
【问题描述】:

我希望有人可以帮助我:)

我有一个包含大约 1000 列的数据框。 在其中,我有这样命名的列: X1,X2,X3,X4,X5,X6 等... Y1,Y2,Y3,Y4,Y5,Y6 等...

df <- data.frame("X1" = c("Yes","No","Yes","NA","NA","NA","Yes","No","Yes","NA","NA","NA","NA"),
                "X2" = c("Yes","NA","NA","NA","NA","Yes","NA","NA","NA","NA","Yes","NA","NA"), 
                "X3" = c("Yes","NA","NA","NA","Yes","No","Yes","NA","Yes","NA","NA","NA", "Yes"),
                "X4" = c("Yes","No","Yes","NA","NA","NA","Yes","No","Yes","NA","NA","NA","NA"),
                "X5" = c("Yes","NA","NA","NA","NA","Yes","NA","NA","NA","NA","Yes","NA","NA"), 
                "X6" = c("Yes","NA","NA","NA","Yes","No","Yes","NA","Yes","NA","NA","NA", "Yes"),
                "Y1" = c("Yes","No","Yes","NA","NA","NA","Yes","No","Yes","NA","NA","NA","NA"),
                "Y2" = c("Yes","NA","NA","NA","NA","Yes","NA","NA","NA","NA","Yes","NA","NA"), 
                "Y3" = c("Yes","NA","NA","NA","Yes","No","Yes","NA","Yes","NA","NA","NA", "Yes"),
                "Y4" = c("Yes","No","Yes","NA","NA","NA","Yes","No","Yes","NA","NA","NA","NA"),
                "Y5" = c("Yes","NA","NA","NA","NA","Yes","NA","NA","NA","NA","Yes","NA","NA"), 
                "Y6" = c("Yes","NA","NA","NA","Yes","No","Yes","NA","Yes","NA","NA","NA", "Yes"))

在某些列中,我将“是”替换为 1,将“否”替换为 0,并将其他任何内容替换为 NA。

我试过这个:

names = c("X","Y")

for (name in names){
  try(
    for (j in 1:6){
      j <- toString(j)
      colname <- paste(name , j, sep="")
      df$colname <- gsub("Yes", as.integer(1), df$colname)
      df$colname <- gsub("No", as.integer(0), df$colname)
    })}

但是,这不起作用,抛出错误消息:

Error in `$<-.data.frame`(`*tmp*`, "colname", value = character(0)) : replacement has 0 rows, data has 13
  • 我的第一个问题是:为什么列名没有正确引用?

  • 第二个问题是:如何将这些列中不是 0 或 1 的内容替换为“NA”?

这可能是我忽略的一件非常简单的事情,但我不知道该怎么做。 任何帮助将不胜感激。

提前非常感谢, 丰富

【问题讨论】:

  • 为了将来参考,df$colname 是错误的。在这里你需要使用df[, colname],并且在创建df时你也不需要在变量名周围加上引号

标签: r loops gsub


【解决方案1】:

我不会在这里使用循环或 gsub,你可以使用这个:

df[] <- lapply(df, function(x) x <- car::recode(x, "'Yes'=1; 'No'=0; 'NA'=NA"))

这会遍历数据框中的每一列,并根据需要重新编码值。如果您将来获得更多值,这也更容易扩展。

如果你只想要某些列,你可以这样修改:

df[, col_list] <- lapply(df[, col_list], function(x) x <- car::recode(x, "'Yes'=1; 'No'=0; 'NA'=NA"))

其中col_list 是您要更改的变量的向量。你可以使用col_list &lt;- grep('^X|Y', names(df), value = T) grep 来获取它们

【讨论】:

  • 这太棒了,谢谢。我将如何指定要替换的列?例如,我只需要遍历某些列。由于有很多列,我选择像上面一样循环。有没有更好的办法?
  • 是的,刚刚添加了一点
  • 别担心,如果这已经解决了您的问题,请确保将其标记为已接受的解决方案
  • 为什么不直接使用factordf[] &lt;- lapply(df, function(x) as.integer(factor(x, c('No', 'Yes'))) - 1)
  • 是的,这可能会更好。我总是忘记因子有这个属性
【解决方案2】:

由于您的数据只有'Yes''No''NA' 值,您也可以直接替换它们。

#Column numbers to replace
cols <- grep('^[XY]\\d+', names(df))
#Replace "NA" with real NA
df[cols][df[cols] == 'NA'] <- NA
#Replace "Yes" with 1
df[cols][df[cols] == 'Yes'] <- 1
#Replace "No" with 0
df[cols][df[cols] == 'No'] <- 0
#Change dataframe type.
df <- type.convert(df)
df
#   X1 X2 X3 X4 X5 X6 Y1 Y2 Y3 Y4 Y5 Y6
#1   1  1  1  1  1  1  1  1  1  1  1  1
#2   0 NA NA  0 NA NA  0 NA NA  0 NA NA
#3   1 NA NA  1 NA NA  1 NA NA  1 NA NA
#4  NA NA NA NA NA NA NA NA NA NA NA NA
#5  NA NA  1 NA NA  1 NA NA  1 NA NA  1
#6  NA  1  0 NA  1  0 NA  1  0 NA  1  0
#7   1 NA  1  1 NA  1  1 NA  1  1 NA  1
#8   0 NA NA  0 NA NA  0 NA NA  0 NA NA
#9   1 NA  1  1 NA  1  1 NA  1  1 NA  1
#10 NA NA NA NA NA NA NA NA NA NA NA NA
#11 NA  1 NA NA  1 NA NA  1 NA NA  1 NA
#12 NA NA NA NA NA NA NA NA NA NA NA NA
#13 NA NA  1 NA NA  1 NA NA  1 NA NA  1

如果你使用的是R

df[] <- lapply(df, as.character)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2021-06-02
    • 2019-10-30
    • 1970-01-01
    • 2019-04-29
    • 2021-11-26
    • 2021-09-18
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多