R 中因子的逻辑回归误差答案

【问题标题】：Error in Logistic Regression for Factors in RR 中因子的逻辑回归误差
【发布时间】：2019-05-01 21:25:02
【问题描述】：

我正在尝试使用代码进行逻辑回归：

model <- glm (Participation ~ Gender + Race + Ethnicity + Education + Comorbidities + WLProgram + LoseWeight + EverLoseWeight + PastYearLW + Age + BMI, data = LogisticData, family = binomial)

总结（模型）

我不断收到错误：

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :  contrasts can be applied only to factors with 2 or more levels

在查看论坛后，我查看了哪些变量是因素：

str(LogisticData)
'data.frame':   994 obs. of  13 variables:
 $ outcome       : Factor w/ 2 levels "No","Yes": 1 1 2 2 1 2 2 1 2 2 ...
 $ Gender        : Factor w/ 3 levels "Male","Female",..: 1 2 2 1 2 1 1 1 1 
$ Race          : Factor w/ 3 levels "White","Black",..: 1 1 1 3 1 1 1 1 1 1 
$ Ethnicity     : Factor w/ 2 levels "Hispanic/Latino",..: 2 2 2 2 2 2 2 2 2 
$ Education     : Factor w/ 2 levels "Below Bachelors",..: 1 1 1 2 1 1 1 2 1 
$ Comorbidities : Factor w/ 2 levels "No","Yes": 1 1 2 1 1 1 2 2 1 1 ...
$ WLProgram     : Factor w/ 2 levels "No","Yes": NA 1 2 2 1 1 1 NA 1 1 ...
$ LoseWeight    : Factor w/ 2 levels "Yes","No": 2 1 1 1 1 1 1 2 1 1 ...
$ PastYearLW    : Factor w/ 2 levels "Yes","No": NA 2 1 1 1 2 1 NA 1 1 ...
$ EverLoseWeight: Factor w/ 2 levels "Yes","No": 2 1 1 1 1 1 1 2 1 1 ...
$ Age           : int  29 35 69 32 21 45 40 62 59 58 ...
$ Participation : Factor w/ 2 levels "Yes","No": 2 2 1 1 1 1 1 2 1 2 ...
$ BMI           : num  25.7 33.8 26.4 32.3 27.5 ...

所有因素似乎都有 2 个或更多水平。

我还尝试省略 NA，但仍然给我这个错误。

我想要回归中的所有变量，但不知道为什么它不会运行。

表演时：

newdata <- droplevels(na.omit(LogisticData))
> str(newdata)
'data.frame':   840 obs. of  13 variables:
 $ outcome       : Factor w/ 2 levels "No","Yes": 1 2 2 1 2 2 2 2 2 2 ...
 $ Gender        : Factor w/ 3 levels "Male","Female",..: 2 2 1 2 1 1 1 2 1 
 $ Race          : Factor w/ 3 levels "White","Black",..: 1 1 3 1 1 1 1 1 3 
 $ Ethnicity     : Factor w/ 2 levels "Hispanic/Latino",..: 2 2 2 2 2 2 2 2 
 $ Education     : Factor w/ 2 levels "Below Bachelors",..: 1 1 2 1 1 1 1 1 
 $ Comorbidities : Factor w/ 2 levels "No","Yes": 1 2 1 1 1 2 1 1 1 2 ...
 $ WLProgram     : Factor w/ 2 levels "No","Yes": 1 2 2 1 1 1 1 1 1 1 ...
 $ LoseWeight    : Factor w/ 1 level "Yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ PastYearLW    : Factor w/ 2 levels "Yes","No": 2 1 1 1 2 1 1 1 1 2 ...
 $ EverLoseWeight: Factor w/ 1 level "Yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ Age           : int  35 69 32 21 45 40 59 58 23 32 ...
 $ Participation : Factor w/ 2 levels "Yes","No": 2 1 1 1 1 1 1 2 2 1 ...
 $ BMI           : num  33.8 26.4 32.3 27.5 45.4 ...
 - attr(*, "na.action")=Class 'omit'  Named int [1:154] 1 8 13 14 21 24 25 
46 55 58 ...
 .. ..- attr(*, "names")= chr [1:154] "1" "8" "13" "14" ...

这对我来说没有意义，因为您可以在第一个 str(Logisitic Data) 中看到 EverLoseWeight 中显然有 2 个级别，因为您可以看到 Yes 和 No 以及 1 和 2？如何解决此异常？

【问题讨论】：

检查newdata <- droplevels(na.omit(LogisticData))的级别是否相同
乍一看，Ethnicity 看起来很可疑。因素可能有两个水平，但只存在一个水平。考虑x = as.factor(c(1,1,1)); levels(x) = c(1, 2)。
@akrun 级别不一样，但这对我来说没有意义。请查看其他帖子。
可能存在未使用的关卡，即不存在的关卡
我更新了一些解释。但我现在明白了逻辑——如果删除与该变量相关的观察结果，那么它将提供一个级别。我的错。谢谢。

标签： r dplyr

【解决方案1】：

鉴于您的最新情况，您似乎至少有两种可能性。

1：去除NAs后只剩下一个水平的因子（即LoseWeight和EverLoseWeight）。

2：将 NA 视为额外级别。类似于

a = as.factor(c(1,1,NA,2))
b = as.factor(c(1,1,2,1))

# 0 is an unused factor level for a
x = data.frame(a, b)
levels(x$a) = c(levels(x$a), 0)
x$a[is.na(x$a)] = 0

但这可能无法解决任何导致单级因子的奇异性问题。

【讨论】：

是的，我对两者都进行了迭代，但不确定如何在没有“答案”的情况下关闭帖子。

【解决方案2】：

尝试对您的原始数据执行summary 并确保所有级别都有值。我会把它放在评论中，但我没有声誉点:(

【讨论】：