【问题标题】:add NA for a value based on a condition, with tidyverse only, R [duplicate]为基于条件的值添加 NA,仅使用 tidyverse,R [重复]
【发布时间】:2020-06-04 09:32:58
【问题描述】:

我有一个非常奇数的年龄变量,例如 1000、6666。现在显然这个数据不适合任何分析。我想保持明显的年龄,但想用 NA 替换奇怪的数字。例如,0, 1,2,3 4, ... 100,我会保留。但是从 >100 我想把它们设为 NA。然而,只需要 tidyverse 就可以了。例如,我查看了几个函数,例如 na_if,但无法实现我想要的。

这是我拥有的数据示例。看看第 66 行,你就会明白我在说什么。

age_dput <- structure(list(Age = c(63, 19, 23, 28, 40, 31, 60, 26, 35, 44, 
    30, 47, 26, 45, 21, 38, 40, 28, 26, 40, 60, 33, 72, 40, 32, 32, 
    43, 24, 25, 39, 50, 22, 37, 53, 51, 42, 52, 29, 19, 42, 58, 61, 
    29, 26, 45, 29, 20, 26, 28, 43, 2, 42, 40, 33, 43, 53, 55, 27, 
    36, 41, 30, 54, 55, 6222, 21, 26, 38, 23, 48, 29, 44, 42, 35, 
    27, 28, 20, 59, 80, 35, 36, 24, 29, 34, 31, 25, 37, 30, 31, 48, 
    28, 30, 65, 45, 27, 39, 29, 34, 29, 76, 40)), row.names = c(NA, 
    -100L), class = c("tbl_df", "tbl", "data.frame"), problems = structure(list(
        row = c(2910L, 35958L), col = c("how_unwell", "how_unwell"
        ), expected = c("a double", "a double"), actual = c("How Unwell", 
        "How Unwell"), file = c("'/Users/gabrielburcea/Rprojects/data/data_lev_categorical_no_sev.csv'", 
        "'/Users/gabrielburcea/Rprojects/data/data_lev_categorical_no_sev.csv'"
        )), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
    )))

【问题讨论】:

    标签: r tidyverse na


    【解决方案1】:

    您可以使用replaceif_else

    library(dplyr)
    age_dput %>%
      mutate(clean_age_replace = replace(Age, Age > 100, NA_real_), 
             clean_age_if_else = if_else(Age > 100, NA_real_, Age))
    

    【讨论】:

    • 感谢罗纳克。它奏效了。
    【解决方案2】:

    使用na_if()

    library(dplyr)
    age_dput %>% 
      mutate(Age = na_if(Age, Age[Age > 100]))
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2019-12-03
      • 2019-08-12
      • 2020-06-17
      • 1970-01-01
      • 1970-01-01
      • 2021-05-21
      • 1970-01-01
      相关资源
      最近更新 更多