【问题标题】:Error: `data` and `reference` should be factors with the same levels random forrest错误:`data` 和 `reference` 应该是具有相同级别随机森林的因子
【发布时间】:2021-09-20 00:48:37
【问题描述】:

这是我为作业做的代码。我似乎无法获得预测的混淆矩阵,请协助我排除代码故障或提出任何必要的建议。

set.seed(1234) test_index1 <-createDataPartition(water_potability3$Potability,p=0.1,list= FALSE)

water_potability_train <- water_potability3[test_index1,-c(4,6:9)]

water_potability3_test<- water_potability3[!1:nrow(water_potability3)%in%test_index1,-c(4,6:9)]

<- tuneRF(x=water_potability_train[,1:4],y=water_potability_train$Potability) (mintree <-trf[which.min(trf[,2]),1]) <-randomForest(x=water_potability_train[,-5],y=water_potability_train$Potability,mtry = mintree,importance = TRUE)

(rf_model,main="") (rf_model,main="")

preds_rf<- predict(rf_model,water_potability3_test[,-5])

table(preds_rf,water_potability3_test$Potability)

confusionmatrix(preds_rf,water_potability3_test$Potability)

每次我做一个混淆矩阵时,我都会收到错误“错误:datareference 应该是具有相同水平的因素”

【问题讨论】:

    标签: r


    【解决方案1】:

    由于您没有共享允许我重现错误的数据集,因此我将进行猜测并提供我自己会使用的解决方案。如果这对您不起作用,请提供一些数据并解释Potability 列包含的内容:-)

    将数据随机拆分为训练和测试分区时,您可能无法从两个分区中的每个类中获得观察结果。例如。如果您有 10 个类,那么较小的测试分区中可能只有 8 个。然后,当您的模型预测训练分区中可用的其他两个类别之一时,这两个因素具有不同的水平。

    所以我使用来自groupdata2partition()cat_col 参数,以确保每个类都在两个分区中表示(如果可能的话)。然后我使用cvms 中的confusion_matrix(),因为它允许两个因素的不同水平。

    library(groupdata2)
    library(cvms)
    set.seed(1234) 
    
    # Create list with two partitions
    # where the ratio of classes in Potability are similar
    parts <- partition(water_potability3[, -c(4,6:9)], 
                       p = 0.1, cat_col = "Potability")
    
    # Extract the two partitions
    water_potability3_test <- parts[[1]]
    water_potability3_train <- parts[[2]]
    
    # The modeling (haven't changed anything here)
    trf <- tuneRF(x = water_potability_train[, 1:4],
                  y = water_potability_train$Potability) 
    
    (mintree <- trf[which.min(trf[, 2]), 1]) 
    
    rf_model <- randomForest(
        x = water_potability_train[, -5],
        y = water_potability_train$Potability,
        mtry = mintree,
        importance = TRUE
    )
    
    preds_rf <- predict(rf_model, water_potability3_test[, -5])
    
    # Create confusion matrix
    conf_mat <- cvms::confusion_matrix(
        targets = water_potability3_test$Potability,
        predictions = preds_rf
    )
    
    # The basic confusion matrix table
    conf_mat$Table
    
    # Or as a plot
    plot_confusion_matrix(conf_mat)
    
    

    您还可以查看cvms::evaluate(),它有额外的评估指标。

    了解更多

    在此处了解有关 groupdata2 训练/测试分区功能的更多信息: https://cran.rstudio.com//web/packages/groupdata2/vignettes/cross-validation_with_groupdata2.html

    这里有更多关于 cvms 混淆矩阵功能的信息: https://cran.r-project.org/web/packages/cvms/vignettes/Creating_a_confusion_matrix.html

    【讨论】:

      猜你喜欢
      • 2020-03-16
      • 2021-12-20
      • 2019-01-04
      • 2021-03-18
      • 2019-11-21
      • 2020-08-16
      • 2019-06-20
      • 2020-09-17
      • 2014-09-19
      相关资源
      最近更新 更多