【问题标题】:caretList failing to create modelscaretList 无法创建模型
【发布时间】:2020-06-02 21:28:40
【问题描述】:

我正在尝试使用 caretList 进行合奏。我正在使用的代码如下。 代码:::

library(tidyverse)
library(caret)
library(doParallel)
library(nnet)
library(e1071)
library(caretEnsemble)

#load data.
#using data sets created in assignment1.R
set.seed(1234)
assignment_data1<-train.data
# create training and test data from the 58104 examples
training.samples <- assignment_data1$Cover_Type %>% 
  createDataPartition(p = 0.7, list = FALSE)
train.data_ensemble  <- assignment_data1[training.samples, ]
test.data_ensemble <- assignment_data1[-training.samples, ]

#set up parallel env 


cl<-makePSOCKcluster(detectCores()-3)
registerDoParallel(cl)


set.seed(1234)
my_control <- trainControl(method = "cv", # for “cross-validation”
                                        number = 10, # number of k-folds
                                        savePredictions = "final",
                                        classProbs = TRUE,
                                        index=createResample(train.data_ensemble$Cover_Type , 25),
                                        allowParallel = TRUE)

model_list <- c("ranger", "rpart","svmLinear","nnet")


set.seed(1234)
# Fit the model on the training set without preProcess
list_of_models<- caretList(
  Cover_Type ~., data = train.data_ensemble,
  methodList =model_list,
  trControl = my_control,
  tuneLength = 20,
  continue_on_fail = TRUE
)

我得到的错误如下:

caretList 中的错误(Cover_Type ~ ., data = train.data_ensemble, methodList = model_list, : caret:train 对所有模型都失败了。请检查您的数据。

当我使用 train() 单独拟合模型时,我没有问题,我确实得到了结果。使用的数据集是来自 Kaggle (https://www.kaggle.com/c/forest-cover-type-prediction) 的覆盖类型预测。

【问题讨论】:

  • 这行得通。一个后续问题 - 有没有一种方法可以使用 caretStack() 来解决多分类问题来创建一个集合。

标签: r r-caret


【解决方案1】:

看kaggle网站和数据,我用的是train.csv,是多类问题:

library(caret)
library(rpart)
library(e1071)
library(caretEnsemble)

set.seed(1234)
assignment_data1<-read.csv("train.csv")
assignment_data1$Cover_Type = factor(assignment_data1$Cover_Type)

idx <- createDataPartition(assignment_data1$Cover_Type,
p = 0.1, list = FALSE)
train.data_ensemble  <- assignment_data1[idx, ]

由于我的笔记本电脑内存有限,我只拿了 10% 的后续部分,所以这些是标签:

table(train.data_ensemble$Cover_Type)
  1   2   3   4   5   6   7 
216 216 216 216 216 216 216 

我们设置了 trainControl:

my_control <- trainControl(method = "cv", 
                      number = 3,
                      classProbs=TRUE, 
                      savePredictions = "final",
                      index=createResample(train.data_ensemble$Cover_Type ,3))

在 say nnet 上单独运行会引发错误:

train(Cover_Type ~., data = train.data_ensemble,method="nnet",trControl = my_control,tuneLength = 2)
Error: At least one of the class levels is not a valid R variable name; This will cause errors when class probabilities are generated because the variables names will be converted to  X1, X2, X3, X4, X5, X6, X7 . Please use factor levels that can be used as valid R variable names  (see ?make.names for help).

我们纠正这个:

train.data_ensemble$Cover_Type = paste0("type",as.character(train.data_ensemble$Cover_Type))

并完成插入符号列表:

model_list <- c("nnet", "rpart","ranger")

set.seed(1234)
# Fit the model on the training set without preProcess
list_of_models<- caretList(
  Cover_Type ~., data = train.data_ensemble,
  methodList =model_list,
  trControl = my_control,
  tuneLength = 2,
  continue_on_fail = TRUE
)

names(list_of_models)
[1] "nnet"   "rpart"  "ranger"

lapply(list_of_models,"[[","results")
$nnet
  size decay  Accuracy      Kappa AccuracySD    KappaSD
1    1   0.0 0.1350390 0.01183745 0.01558538 0.02050306
2    1   0.1 0.1660759 0.04730726 0.01211138 0.01601860
3    3   0.0 0.1857729 0.05877921 0.01687908 0.01257810
4    3   0.1 0.2509231 0.13049948 0.03601895 0.03905056

$rpart
         cp  Accuracy     Kappa AccuracySD    KappaSD
1 0.1226852 0.2986852 0.1906243 0.05857385 0.06756310
2 0.1666667 0.2162676 0.1010794 0.08420706 0.08754039

$ranger
  mtry min.node.size  splitrule  Accuracy     Kappa  AccuracySD     KappaSD
1    2             1       gini 0.6736713 0.6198463 0.017877146 0.021061761
2    2             1 extratrees 0.6357918 0.5758087 0.020871998 0.024462156
3   55             1       gini 0.7098266 0.6613173 0.007074515 0.008099901
4   55             1 extratrees 0.7496037 0.7075914 0.009073924 0.010413872

【讨论】:

    猜你喜欢
    • 2020-10-26
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多