【问题标题】:Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels): factor X has new levelsmodel.frame.default (Terms, newdata, na.action = na.action, xlev = object$xlevels) 中的错误:因子 X 有新的水平
【发布时间】:2017-08-20 16:50:17
【问题描述】:

我做了一个逻辑回归:

 EW <- glm(everwrk~age_p + r_maritl, data = NH11, family = "binomial")

此外,我想为r_maritl 的每个级别预测everwrk

r_maritl 有以下几个级别:

levels(NH11$r_maritl)
 "0 Under 14 years" 
 "1 Married - spouse in household" 
 "2 Married - spouse not in household"
 "3 Married - spouse in household unknown" 
 "4 Widowed"                               
 "5 Divorced"                             
 "6 Separated"                             
 "7 Never married"                        
 "8 Living with partner"  
 "9 Unknown marital status"  

所以我做到了:

predEW <- with(NH11,
expand.grid(r_maritl = c( "0 Under 14 years", "1 Married - 
spouse in household", "2 Married - spouse not in household", "3 Married - 
spouse in household unknown", "4 Widowed", "5 Divorced", "6 Separated", "7 
Never married", "8 Living with partner", "9 Unknown marital status"),
age_p = mean(age_p,na.rm = TRUE)))

cbind(predEW, predict(EW, type = "response",
                        se.fit = TRUE, interval = "confidence",
                        newdata = predEW))

问题是我得到以下响应:

model.frame.default 中的错误(条款,新数据,na.action = na.action,xlev = object$xlevels) : 因子 r_maritl 有新的水平 0 14 岁以下,已婚 - 家庭中的配偶不详

样本数据:

str(NH11$age_p)
num [1:33014] 47 18 79 51 43 41 21 20 33 56 ...

str(NH11$everwrk)
Factor w/ 2 levels "2 No","1 Yes": NA NA 2 NA NA NA NA NA 2 2 ...

str(NH11$r_maritl)
Factor w/ 10 levels "0 Under 14 years",..: 6 8 5 7 2 2 8 8 8 2 ...

【问题讨论】:

  • 您能提供一些示例数据吗?我无法重现您的问题,例如mtcars。另外,您的数据集中是否有未使用的因子NH11

标签: r logistic-regression


【解决方案1】:

tl;dr 看起来您的因子中有一些未在数据中表示的级别,这些级别已从模型中使用的因子中删除。事后看来,这并不令人惊讶,因为您将无法预测这些水平的反应。也就是说,有点令人惊讶的是,R 并没有为你做一些好事,比如自动生成 NA 值。您可以通过在构建预测框架时使用levels(droplevels(NH11$r_maritl)) 或等效的EW$xlevels$r_maritl 来解决此问题。

一个可重现的例子:

maritl_levels <- c( "0 Under 14 years", "1 Married - spouse in household", 
  "2 Married - spouse not in household", "3 Married - spouse in household unknown", 
  "4 Widowed", "5 Divorced", "6 Separated", "7 Never married", "8 Living with partner", 
 "9 Unknown marital status")
set.seed(101)
NH11 <- data.frame(everwrk=rbinom(1000,size=1,prob=0.5),
                 age_p=runif(1000,20,50),
                 r_maritl = sample(maritl_levels,size=1000,replace=TRUE))

让我们做一个缺失的关卡:

NH11 <- subset(NH11,as.numeric(NH11$r_maritl) != 3)

拟合模型:

EW <- glm(everwrk~r_maritl+age_p,data=NH11,family=binomial)
predEW <- with(NH11,
  expand.grid(r_maritl=levels(r_maritl),age_p=mean(age_p,na.rm=TRUE)))
predict(EW,newdata=predEW)

成功了!

model.frame.default 中的错误(条款,新数据,na.action = na.action,xlev = object$xlevels): 因素 r_maritl 有新的 2 级已婚 - 配偶不在家庭中

predEW <- with(NH11,
           expand.grid(r_maritl=EW$xlevels$r_maritl,age_p=mean(age_p,na.rm=TRUE)))
predict(EW,newdata=predEW)

【讨论】:

  • 我真的,真的很怀疑[版本差异]。这是已经存在了几十年并被成千上万的用户使用的所有核心 R 功能。我之前强烈反对 OP 的一个微妙的错字。
  • 酷,我今天学到了一些东西。 (我不认为我以前遇到过这个特殊问题。)
  • 该命令并没有按照您的想法执行。使用droplevels()
【解决方案2】:

非常感谢您的回答,我也面临同样的问题,新的水平。 我对代码进行了以下更改。

  1. 我使用的是 data.frame() 并将其替换为 expand.grid() 函数
  2. 在 mean 函数中添加了 na.rm=TRUE 以及变量
  3. 将因子(1:2) 替换为glmoutput$xlevels$variablename

解决方案有效!

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2017-04-02
    • 2018-07-12
    • 1970-01-01
    • 1970-01-01
    • 2016-06-18
    • 2018-09-10
    • 2011-08-02
    • 2021-10-27
    相关资源
    最近更新 更多