【问题标题】:How to add na based on condition for a whole dataframe如何根据整个数据框的条件添加 na
【发布时间】:2022-11-02 21:56:44
【问题描述】:

我只想知道如何为整个数据框查找空列并将其替换为 na

样本数据

structure(list(id = structure(8.44425875736171e-318, class = "integer64"), 
    project_id = 11L, experiment_id = 85L, 
    gene = "", si = -0.381, pi = "" 
    on1 = "CC", 
    on2 = "GG", 
    on3 = "aa", 
    created_at = structure(1618862091.85075, class = c("POSIXct", 
    "POSIXt"), tzone = "UTC")), row.names = c(NA, -1L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x000001ba09da3590>)

我有一个检查特定列的解决方案,但我不知道如何将其应用于整个数据框

data$gene <- ifelse((is.na(data$gene) == TRUE),'NA',data$gene)

【问题讨论】:

  • 您创建 data.frame 的代码不起作用。你想做什么也很模糊。在您的示例中,您似乎想用 'NA' 字符串值替换 NA 值?
  • 你的基因数据是""。以下内容与您尝试的内容接近:data$gene &lt;- ifelse(data$gene == "", 'NA', data$gene)

标签: r


【解决方案1】:

您可以使用 lapplygsub 来用 NA 替换每个空单元格,如下所示:

df <- structure(list(id = structure(8.44425875736171e-318, class = "integer64"), 
                     project_id = 11L, experiment_id = 85L, 
                     gene = "", si = -0.381, pi = "", 
                     on1 = "CC", 
                     on2 = "GG", 
                     on3 = "aa", 
                     created_at = structure(1618862091.85075, class = c("POSIXct", 
                                                                        "POSIXt"), tzone = "UTC")), row.names = c(NA, -1L), class = c("data.table", 
                                                                                                                                      "data.frame"))

df
#>              id project_id experiment_id gene     si pi on1 on2 on3
#> 1 8.444259e-318         11            85      -0.381     CC  GG  aa
#>            created_at
#> 1 2021-04-19 19:54:51
df[] <- lapply(df, function(x) gsub("^$", NA, x))
df
#>                      id project_id experiment_id gene     si   pi on1 on2 on3
#> 1 8.44425875736171e-318         11            85 <NA> -0.381 <NA>  CC  GG  aa
#>            created_at
#> 1 2021-04-19 19:54:51

创建于 2022-11-02,reprex v2.0.2

【讨论】:

    【解决方案2】:

    您还可以将dplyrmutateacross 一起使用

    library(dplyr)
    library(tidyr)
    
    df <- structure(list(id = structure(8.44425875736171e-318, class = "integer64"), 
                         project_id = 11L, experiment_id = 85L, 
                         gene = "", si = -0.381, pi = "", 
                         on1 = "CC", 
                         on2 = "GG", 
                         on3 = "aa", 
                         created_at = structure(1618862091.85075, class = c("POSIXct", 
                                                                            "POSIXt"), tzone = "UTC")), row.names = c(NA, -1L), class = c("data.table", 
                                                                                                                                          "data.frame"))
    
    df %>% 
      mutate(dplyr::across(where(is.character), ~ gsub("^$", NA, .x)))
    
    
    

    请注意,我还尝试使用replace_na,但这仅适用于以下值实际上NA

    test %>% 
      mutate(dplyr::across(where(is.character), ~ replace_na(.x, "NA")))
    
    • "" 不考虑
    • NA 被认为是 NA

    在执行分析时请记住这一点。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-06-29
      • 2020-01-21
      • 2019-08-29
      • 1970-01-01
      • 2019-07-27
      • 2019-09-27
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多