将决策边界拟合到 R 中的逻辑回归模型答案

【问题标题】：Fit decision boundary to logistic regression model in R将决策边界拟合到 R 中的逻辑回归模型
【发布时间】：2013-06-09 05:27:09
【问题描述】：

我正在努力使用 ggplot 在 R 中绘制决策边界。

我有 2 个变量（考试成绩）和一个二元分类，即学生是否被录取。数据如下：

> 头（考试数据） Exam1Score Exam2Score 录取 1 34.62366 78.02469 0 2 30.28671 43.89500 0 3 35.84741 72.90220 0 4 60.18260 86.30855 1 5 79.03274 75.34438 1 6 45.08328 56.31637 0

我可以使用 ggplot 绘制数据：

exam.plot <- ggplot(data=exam.data, aes(x=Exam1Score, y=Exam2Score, col = ifelse(Admitted == 1,'dark green','red'), size=0.5))+
  geom_point()+
  labs(x="Exam 1 Scores", y="Exam 2 Scores", title="Exam Scores", colour="Exam Scores")+
  theme_bw()+
  theme(legend.position="none")

然后成功拟合逻辑回归模型：

exam.lm <- glm(data=exam.data, formula=Admitted ~ Exam1Score + Exam2Score, family="binomial")

所以在网上搜索了很多之后，我决定手动调整决策边界（虽然尝试了一段时间使用 stat_smooth 但无法让它工作），我尝试了以下方法：

# Fit the decision boundary
plot_x <- c(min(exam.data$Exam1Score)-2, max(exam.data$Exam1Score)+2)
plot_y <- (-1 /coef(exam.lm)[3]) * (coef(exam.lm)[2] * plot_x + coef(exam.lm)[1])
db.data <- data.frame(rbind(plot_x, plot_y))
colnames(db.data) <- c('x','y')

# Add the decision boundary plot
ggplot()+geom_line(data=db.data, aes(x=x, y=y))

成功绘制了决策边界，但我无法将其添加到我现有的绘图中：

> exam.plot+geom_line(data=db.data, aes(x=x, y=y))
Error: Aesthetics must either be length one, or the same length as the dataProblems:x, y

谁能指出我做错了什么，或者我是否真的可以用 +stat_smooth() 做到这一点？

所有代码（ex2.R）和文件都在这里：https://github.com/StuHorsman/rscripts/tree/master/R/Coursera

谢谢！

斯图尔特

更新：我可以实现一些类似的功能：

plot(exam.data$Exam1Score, exam.data$Exam2Score, type="n", xlab="Exam 1 Scores", ylab="Exam 2 Scores")      
points(exam.data$Exam1Score[exam.data$Admitted==1], exam.data$Exam2Score[exam.data$Admitted==1], pch=4, col="green")  
points(exam.data$Exam1Score[exam.data$Admitted==0], exam.data$Exam2Score[exam.data$Admitted==0], pch=4, col="red")        
lines(db.data, col="blue")

【问题讨论】：

尝试使用geom_segment 而不是geom_line。如果您使您的示例可重现（即包含我可以使用的数据，而不仅仅是一个示例），我可能会尝试发布一个解决方案。
嗨，Andrie，我把所有的代码和数据集都发布到了github，链接在上面。感谢您的提示，我将查看 geom_segment。
我注意到您将脚本标记为 Coursera。您的课程荣誉代码对在线发布代码有何看法？其他一些 Coursera 课程明确禁止这样做。
该代码是我自己的代码，与 Coursera 课程无关或者是一个解决方案，我在 R 中做一些不是课程要求的练习。

标签： r ggplot2

【解决方案1】：

问题在于，在exam.plot 中，您不仅使用美学x 和y，还使用col 和size（后者不必要）。这些层需要具有在ggplot () 调用中定义的all 美学集。（我经常被这个问题所困扰）。

因此：

exam.plot+geom_line(data=db.data, aes(x=x, y=y), col = "black", size = 1)

绘制。

但是，我建议稍微更改exam.plot 并删除所有不适用于所有图层的美学（并将它们放入图层定义中）：

exam.plot <- ggplot(data=exam.data, aes(x = Exam1Score, y=Exam2Score))+
  geom_point(aes (col = Admitted), size = 0.5)+
  scale_color_manual (values =  c('red', 'dark green')) + 
  labs(x="Exam 1 Scores", y="Exam 2 Scores", title="Exam Scores", colour="Exam Scores")+
  theme_bw()+
  coord_equal () +  # assuming that the scores have the same scale.
  theme(legend.position="none")

exam.plot + geom_line(data=db.data, aes(x=x, y=y))

其中有示例数据

exam.data <- data.frame (Exam1Score = rnorm (100) + 0:1, 
                         Exam2Score = rnorm (100) + 0:1, 
                         Admitted = factor (rep (0:1, 50)))

产量：

（使用默认大小绘制，在本示例中几乎看不到 0.5）

【讨论】：

太棒了！非常感谢。我对在哪里设置 aes 感到困惑，但现在不是了。研究。

【解决方案2】：

为什么不使用 stat_function？

g=ggplot(exam.data,aes(x=Exam1score,y=Exam2score,col=factor(Admitted)))
g=g+geom_point(size=2.2)+scale_color_discrete(name="Administered")
g=g+stat_function(fun=function(x){(-Intercept-Beta1*x)/Beta2},xlim=c(0,100))
g

Intercept,beta1,beta2 是逻辑回归函数的参数。

【讨论】：