【发布时间】:2021-10-10 08:39:44
【问题描述】:
我正在调用 ranger 来对大型混合数据框架的多分类问题进行建模(其中一些分类变量的级别超过 53 个)。训练和测试运行没有任何问题。但是,解释混淆矩阵/列联表会遇到麻烦。
我使用虹膜数据来解释我面临的困难,将物种视为分类变量,
library(ranger)
library(caret)
# Data
idx = sample(nrow(iris),100)
data = iris
# Split data sets
Train_Set = data[idx,]
Test_Set = data[-idx,]
# Train
Species.ranger <- ranger(Species ~ ., ,data=Train_Set,importance="impurity", save.memory = TRUE, probability=TRUE)
# Test
probabilitiesSpecies <- predict(Species.ranger, data = Test_Set,type='response', verbose = TRUE)
or
probabilitiesSpecies <- as.data.frame(predict(Species.ranger, data = Test_Set,type='response', verbose = TRUE)$predictions)
遇到以下困难:
table(Test_Set$Species, probabilitiesSpecies$predictions)
Error in table(Test_Set$Species, probabilitiesSpecies$predictions) :
all arguments must have the same length
或
caret::confusionMatrix(Test_Set$Species, probabilitiesSpecies$predictions)
or
caret::confusionMatrix(table(Test_Set$Species, max.col(probabilitiesSpecies)-1))
gives
Error: `data` and `reference` should be factors with the same levels.
然而,下面显示的二分类是有效的:
idx = sample(nrow(iris),100)
data = iris
data$Species = factor(ifelse(data$Species=="virginica",1,0))
Train_Set = data[idx,]
Test_Set = data[-idx,]
# Train
Species.ranger <- ranger(Species ~ ., ,data=Train_Set,importance="impurity", save.memory = TRUE, probability=TRUE)
# Test
probabilitiesSpecies <- as.data.frame(predict(Species.ranger, data = Test_Set,type='response', verbose = TRUE)$predictions)
caret::confusionMatrix(table(max.col(probabilitiesSpecies)-1, Test_Set$Species))
如何解决这个问题以进行多分类以获得混淆矩阵?我也将其设置为单独的线程 (Error while computing confusion matrix for multiclassification using ranger)
【问题讨论】:
标签: r machine-learning classification confusion-matrix r-ranger