【问题标题】:Create dummy for missing values in numeric variable in r为 r 中的数值变量中的缺失值创建虚拟变量
【发布时间】:2015-09-11 08:56:05
【问题描述】:

我有以下数据:

PassengerId Survived Pclass    Sex Age SibSp Parch    Fare Embarked
1           1        0      3   male  22     1     0  7.2500        S
2           2        1      1 female  38     1     0 71.2833        C
3           3        1      3 female  26     0     0  7.9250        S
4           4        1      1 female  35     1     0 53.1000        S
5           5        0      3   male  35     0     0  8.0500        S
6           6        0      3   male  NA     0     0  8.4583        Q

现在,当我使用dummydummy.data.frame 时,我可以成功地将因子(此处为SexEmbarked)转换为这样的假人:

PassengerId Survived Pclass Sexfemale Sexmale Age SibSp Parch    Fare Embarked EmbarkedC EmbarkedQ EmbarkedS
1           1        0      3         0       1  22     1     0  7.2500        0         0         0         1
2           2        1      1         1       0  38     1     0 71.2833        0         1         0         0
3           3        1      3         1       0  26     0     0  7.9250        0         0         0         1
4           4        1      1         1       0  35     1     0 53.1000        0         0         0         1
5           5        0      3         0       1  35     0     0  8.0500        0         0         0         1
6           6        0      3         0       1  NA     0     0  8.4583        0         0         1         0

现在,如果我如何在 Age 列上应用它,它会创建 100 多个假人,一个用于每个唯一年龄条目,一个用于 NA。我希望输出像

Age   Age.NA
22    0 
38    0
......
35    0
0     1

它会自动将缺失值视为不同的条目,并在出现因素时为其创建一个变量,但我希望在数值变量的情况下实现相同的目标,而不会妨碍列中已经存在的值。请帮忙。

【问题讨论】:

    标签: r dataframe missing-data dummy-variable


    【解决方案1】:

    你可以使用:

    df$Age.NA <- ifelse(is.na(df$Age), 1, 0)
    

    然后:

    library(dummies)
    dummy.data.frame(df)
    

    输出:

      PassengerId Survived Pclass Sexfemale Sexmale Age SibSp Parch    Fare EmbarkedC EmbarkedQ EmbarkedS Age.NA
    1           1        0      3         0       1  22     1     0  7.2500         0         0         1      0
    2           2        1      1         1       0  38     1     0 71.2833         1         0         0      0
    3           3        1      3         1       0  26     0     0  7.9250         0         0         1      0
    4           4        1      1         1       0  35     1     0 53.1000         0         0         1      0
    5           5        0      3         0       1  35     0     0  8.0500         0         0         1      0
    6           6        0      3         0       1  NA     0     0  8.4583         0         1         0      1
    

    数据:

    df <- structure(list(PassengerId = 1:6, Survived = c(0L, 1L, 1L, 1L, 
    0L, 0L), Pclass = c(3L, 1L, 3L, 1L, 3L, 3L), Sex = structure(c(2L, 
    1L, 1L, 1L, 2L, 2L), .Label = c("female", "male"), class = "factor"), 
        Age = c(22L, 38L, 26L, 35L, 35L, NA), SibSp = c(1L, 1L, 0L, 
        1L, 0L, 0L), Parch = c(0L, 0L, 0L, 0L, 0L, 0L), Fare = c(7.25, 
        71.2833, 7.925, 53.1, 8.05, 8.4583), Embarked = structure(c(3L, 
        1L, 3L, 3L, 3L, 2L), .Label = c("C", "Q", "S"), class = "factor"), 
        Age.NA = c(0, 0, 0, 0, 0, 1)), .Names = c("PassengerId", 
    "Survived", "Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", 
    "Embarked", "Age.NA"), row.names = c("1", "2", "3", "4", "5", 
    "6"), class = "data.frame")
    

    【讨论】:

      【解决方案2】:

      使用ifelse() 语句检查NA

      Age.NA &lt;- ifelse(is.na(Age), 1, 0)

      【讨论】:

      • 嗨,基本上我想创建两列而不是一列。我想将原始Age 列的NA 值替换为0。并根据是否存在缺失值创建一个包含0 和1 的单独列。 dummy 所做的事情。
      • 照样做:Age &lt;- ifelse(is.na(Age),0,Age)
      猜你喜欢
      • 2020-03-19
      • 1970-01-01
      • 1970-01-01
      • 2016-10-21
      • 2023-03-24
      • 1970-01-01
      • 2013-09-23
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多