获得 R 中连续变量的随机森林预测精度答案

【问题标题】：Getting random forest prediction accuracy for a continuous variable in R获得 R 中连续变量的随机森林预测精度
【发布时间】：2015-07-11 20:23:46
【问题描述】：

我正在尝试使用随机森林预测R 中的连续变量（计数）。预测变量的值为min=1和max=1000。

我尝试使用“confusionMatrix”获得预测精度，但自然得到预测和预测之间不同级别数的误差。

在这些情况下获得预测准确性的最佳方法是什么？

【问题讨论】：

to predict a continuous variable values of the predicted variable are min=1 and max=1000 prediction accuracy with "confusionMatrix" en.wikipedia.org/wiki/Root-mean-square_deviation)
那么，我应该把 Rsquared 作为我的预测准确度指标吗？

标签： r machine-learning random-forest

【解决方案1】：

randomForest 应该只显示分类结果的混淆矩阵，因此请尝试确保您的结果是数字的。然后，这将显示均方残差。例如：

library(randomForest)
# This is probably what you're seeing
randomForest(as.factor(Sepal.Length) ~ Sepal.Width, data=iris)
# This is what you want to see
randomForest(Sepal.Length ~ Sepal.Width, data=iris)

【讨论】：

【解决方案2】：

@mishakob

粗略地说，均方根误差可以理解为实际值和拟合值之间的归一化偏差。可以通过以下方式获得。

library(randomForest)
set.seed(1237)
iris.rg <- randomForest(Sepal.Length ~ ., data=iris, importance=TRUE,
                        proximity=TRUE)

sqrt(sum((iris.rg$predicted - iris$Sepal.Length)^2) / nrow(iris))
[1] 0.3706187

【讨论】：