【问题标题】:How to plot a learning curve in R?如何在 R 中绘制学习曲线?
【发布时间】:2016-11-29 15:23:15
【问题描述】:

我想在我的应用程序中绘制学习曲线。

示例曲线图像如下所示。

学习曲线是以下方差之间的图,

  • X 轴:样本数(训练集大小)。
  • Y 轴:误差(RSS/J(theta)/代价函数)

它有助于观察我们的模型是否存在高偏差或高方差问题。

R 中是否有任何包可以帮助获得这个情节?

【问题讨论】:

  • 您好,请点赞并单击绿色箭头以选择任何有用的答案,以表示感谢。谢谢。

标签: regression linear-regression


【解决方案1】:

您可以使用出色的Caret 包制作这样的情节。 Customizing the tuning process 部分会很有帮助。

此外,您还可以查看 Joseph Rickert 在 R-Bloggers 上撰写的精彩博文。它们的标题为"Why Big Data? Learning Curves""Learning from Learning Curves"

更新
我刚刚就这个问题发了一个帖子Plot learning curves with caret package and R。我想我的回答会对你更有用。为方便起见,我在此处使用 R 绘制学习曲线时复制了相同的答案。但是,我使用流行的 caret 包来训练我的模型并获得训练和测试集的 RMSE 误差。

# set seed for reproducibility
set.seed(7)

# randomize mtcars
mtcars <- mtcars[sample(nrow(mtcars)),]

# split iris data into training and test sets
mtcarsIndex <- createDataPartition(mtcars$mpg, p = .625, list = F)
mtcarsTrain <- mtcars[mtcarsIndex,]
mtcarsTest <- mtcars[-mtcarsIndex,]

# create empty data frame 
learnCurve <- data.frame(m = integer(21),
                     trainRMSE = integer(21),
                     cvRMSE = integer(21))

# test data response feature
testY <- mtcarsTest$mpg

# Run algorithms using 10-fold cross validation with 3 repeats
trainControl <- trainControl(method="repeatedcv", number=10, repeats=3)
metric <- "RMSE"

# loop over training examples
for (i in 3:21) {
    learnCurve$m[i] <- i

    # train learning algorithm with size i
    fit.lm <- train(mpg~., data=mtcarsTrain[1:i,], method="lm", metric=metric,
             preProc=c("center", "scale"), trControl=trainControl)        
    learnCurve$trainRMSE[i] <- fit.lm$results$RMSE

    # use trained parameters to predict on test data
    prediction <- predict(fit.lm, newdata = mtcarsTest[,-1])
    rmse <- postResample(prediction, testY)
    learnCurve$cvRMSE[i] <- rmse[1]
}

pdf("LinearRegressionLearningCurve.pdf", width = 7, height = 7, pointsize=12)

# plot learning curves of training set size vs. error measure
# for training set and test set
plot(log(learnCurve$trainRMSE),type = "o",col = "red", xlab = "Training set size",
          ylab = "Error (RMSE)", main = "Linear Model Learning Curve")
lines(log(learnCurve$cvRMSE), type = "o", col = "blue")
legend('topright', c("Train error", "Test error"), lty = c(1,1), lwd = c(2.5, 2.5),
       col = c("red", "blue"))

dev.off()

输出图如下图:

【讨论】:

    猜你喜欢
    • 2018-06-10
    • 2012-05-28
    • 2013-12-20
    • 2016-10-06
    • 1970-01-01
    • 2020-09-09
    • 2014-03-16
    • 2017-02-14
    • 2019-03-14
    相关资源
    最近更新 更多