【问题标题】:how to test the accuracy of random forest model using a Table如何使用表格测试随机森林模型的准确性
【发布时间】:2013-12-03 20:53:12
【问题描述】:

我是 randomForest 模型的新手,需要帮助。 我从我的 train data.frame 创建了一个包含 500 棵树的随机森林,并为特定变量创建了一组响应预测。我需要将预测与表格中的原始观察结果进行比较。我该怎么做呢?我试着制作table(test, predictions),但有点难以理解表格告诉我的内容。

【问题讨论】:

  • 为什么不提供数据和表格的示例?这不是修辞。
  • 数据是来自一项调查的答案,因子水平:非常同意、同意、没有答案、不同意、非常不同意" 我被要求隔离一个调查问题的回答,并将这些回答与这个问题介于实际观察到的响应和随机森林 nTree=500 响应之间。我以这种方式制作了随机森林:> rf1p1

标签: r random-forest


【解决方案1】:

由于您没有提供数据,我尝试复制它,这是我遵循的过程:

复制数据:

Level <- c("strongly disagree", "disagree", "no answer", "agree", "strongly agree")

Question5 <- c("strongly agree", "agree", "no answer", "disagree", "strongly disagree", "disagree", "no answer", "agree", "strongly disagree")
Question5 <- factor(Question5, levels=Level, ordered=T)
train <- data.frame(a=c(2,3,5,1,2,1,4,1,4), b=c(4,1,3,2,5,3,4,1,2), Question5)

Question5 <- c("strongly disagree", "no answer", "agree", "strongly disagree", "disagree", "strongly agree", "no answer", "disagree", "strongly agree")
Question5 <- factor(Question5, levels=Level, ordered=T)
test <- data.frame(a=c(4,3,5,2,1,3,4,2,5), b=c(5,2,3,1,4,3,2,4,1), Question5)

应用随机森林:

> library(randomForest)
> rf1 <- randomForest(Question5~., data=train, ntree=500)
> p1 <- predict(rf1, test, type='response')
> table(p1, test$Question5)
p1                  strongly disagree disagree no answer agree strongly agree
  strongly disagree                 0        0         2     0              1
  disagree                          0        1         0     0              0
  no answer                         1        0         0     1              0
  agree                             1        0         0     0              1
  strongly agree                    0        1         0     0              0

当您对数据执行此过程时,您应该会得到一个类似于我上面得到的表格。当你把这张表的对角线元素相加时,你会得到正确预测的总数(在上面的例子中是 9 个中的 1 个)。

【讨论】:

    【解决方案2】:

    我用来评估模型的东西是:

    evaluate <- function(actuals, predictions){
      cf.matrix <- table(actuals,predictions)
      cf.precision <- cf.matrix[2, 2] / sum(cf.matrix[, 2])
      cf.prop_miss <- cf.matrix[2, 1] / sum(cf.matrix[2, ])
      cf.accuracy <- (cf.matrix[1, 1] + cf.matrix[2, 2]) / sum(cf.matrix)
      cf.TruePositiveRate <- cf.matrix[2,2] / sum(cf.matrix[2, ])
      cf.FalsePositiveRate <- cf.matrix[1, 2] / sum(cf.matrix[1, ])
      cf.prevalence <- sum(cf.matrix[2, ]) / sum(cf.matrix)
    
      output <- list(cf.matrix,cf.precision,cf.prop_miss,cf.accuracy,cf.TruePositiveRate,cf.FalsePositiveRate,cf.prevalence)
      names(output) <- c('confusion matrix','precision','percent missed','accuracy','True Positive', 'False Positive', 'prevalence')
      return(output)
    }
    

    【讨论】:

      猜你喜欢
      • 2021-07-25
      • 2019-07-10
      • 2016-04-09
      • 2020-09-24
      • 2015-04-24
      • 2015-10-26
      • 2017-12-15
      • 2019-08-21
      • 2019-05-07
      相关资源
      最近更新 更多