如何使用测试数据计算 R 中训练模型的 MSE？答案

【问题标题】：How can I use test data to calculate the MSE for a training model in R?如何使用测试数据计算 R 中训练模型的 MSE？
【发布时间】：2022-01-09 06:23:48
【问题描述】：

set.seed(1234)
training.samples=RealEstate$Y.house.price.of.unit.area%>%createDataPartition(p=0.75,list=FALSE)
train.data=RealEstate[training.samples,]
test.data=RealEstate[-training.samples,]

Price.Model1=lm(Y.house.price.of.unit.area~factor(X1.transaction.date)+
                        X2.house.age+
                        X3.distance.to.the.nearest.MRT.station+
                        X4.number.of.convenience.stores+
                        X5.latitude+
                        X6.longitude,
                data=train.data)

这是正确的吗？

mean((test.data$Y.house.price.of.unit.area-predict(Price.Model1))^2)

我收到了这个警告，所以我不确定我是否做得对：

test.data$Y.house.price.of.unit.area 中的警告 - predict(Price.Model1) ：较长的对象长度不是较短对象长度的倍数

【问题讨论】：

使用newdata参数predict。像这样：predict(Price.Model1, newdata = test.data).

标签： r statistics cross-validation

【解决方案1】：

均方误差定义为：

在 R 中计算它：

用训练数据拟合模型
使用测试数据通过predict() 函数获得预测
使用测试数据的预测值和实际值计算 MSE

使用一些虚假数据...

test_ix <- floor(runif(nrow(mtcars) * 0.2, 1, nrow(mtcars)))
train <- mtcars[-test_ix, ]
X_test <- mtcars[test_ix, ] %>%
  select(!mpg)

y_test <- mtcars[test_ix, "mpg"]

fit <- lm(mpg ~ ., data = train)
yhat <- predict(fit, X_test)

mse <- mean((y_test - yhat) ** 2)

要获得 RMSE，取 MSE 的平方根。

【讨论】：