【问题标题】:Recode Missing Data from Character Field重新编码字符字段中的缺失数据
【发布时间】:2012-06-12 17:47:29
【问题描述】:

注意:标题可能具有误导性。如果您理解我的问题并想出更具描述性的内容 - 请更改它。

我遇到了一种奇怪的情况,即调查的回答都是字符,而不是数字。看来R,真的不喜欢这样。假设我问了一个问题:

Q. In what area do you work? 
East
West
Central
North
South
None of the above

但受访者仅来自东部、西部和中部。

dat <- rep(c("East", "West", "Central"),100)

现在,出于演示目的,重要的是我包括北、南和以上都不是,即使它们都没有。然而,将这些因素考虑在内是具有挑战性的。

让我们试试吧:

fac1 <- factor(dat, labels=c("East","West","Central","North","South","None of the above"))

Error in factor(dat, labels = c("East", "West", "Central", "North", "South",  : 
  invalid labels; length 6 should be 1 or 3

基本上,我想做的是将这些数据与缺失值结合起来。因此,当我输入类似 summary(fac1) 的内容时,它会显示他们在该类别中有 0 个响应。

必须有更简单的方法来做到这一点!

【问题讨论】:

    标签: r


    【解决方案1】:

    不是专家,但这有帮助吗?

    fac1 <- factor(dat, levels = 
                   c("East","West","Central","North","South","None of the above"))
    summary(fac1)
    

    【讨论】:

      【解决方案2】:

      差不多了。您需要使用levels 参数:

      fac1 <- factor(dat, levels=c("East","West","Central","North","South","None of the above"))
      str(fac1)
       Factor w/ 6 levels "East","West",..: 1 2 3 1 2 3 1 2 3 1 ...
      

      levelslabels 的区别是这样的:

      • levels 定义数据中的因子水平
      • labels 允许您一次性重命名因子水平。

      例如:

      fac2 <- factor(
        dat, 
        levels=c("East","West","Central","North","South","None of the above"),
        labels=c("E", "W", "C", "N", "S", "Other")
      )
      str(fac2)
      Factor w/ 6 levels "E","W","C","N",..: 1 2 3 1 2 3 1 2 3 1 ...
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2017-01-31
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多