【问题标题】:Ran caret model and it stopped. It mentioned of missing values in resampled performance measure运行插入符号模型,它停止了。它提到了重采样性能测量中的缺失值
【发布时间】:2020-02-17 01:56:27
【问题描述】:

[数据集] 作为一个新手,我尝试了泰坦尼克号的问题。正要使用数据集进行训练,这就是我卡住的地方:

[data_prepro_maf_train]

all_model<-modelLookup()
classification_model<-all_model%>%filter(forClass==TRUE,!duplicated(model))
class_model<-classification_model$model
set.seed(123)
number<-3
repeats<-2
control<-trainControl(method="repeatedcv",number=number,repeats=repeats,classProbs = TRUE,savePredictions = "final",index=createResample(data_prepro_maf_train$Embarked,repeats*number),summaryFunction = multiClassSummary,allowParallel = TRUE)
x<-data_prepro_maf_train[,c(1,3,5,6,7,8)]
y<-data_prepro_maf_train[,12]
levels(y)<-make.names(levels(factor(data_prepro_maf_train[,12])))
y<-make.names(data_prepro_maf_train[,12],unique=TRUE,allow_=TRUE)
#Train the models
model_list1<-caretList(x,y,data=data_prepro_maf_train,trControl = control,metric="Accuracy",methodList = class_model[1])

我确保选择没有缺失值的列,例如“Cabin”,并且已经删除了所需列的缺失值。

使用的包:

library(caret)
library(caretEnsemble)
library(tidyverse)
library(magrittr)
library(doParallel)

【问题讨论】:

  • 你能提供一个dput(head(df,n))的数据样本吗?
  • 你好 NelsonGon。附上数据集的链接。
  • @Jabby 你能提供对象all_model吗?您的问题中缺少它。所以,我无法继续前进。你加载了哪些库,也请出示一下?
  • all_model
  • 您正在使用caret 包中提供的所有分类模型。所以,训练需要时间。你可以看到this post

标签: r r-caret


【解决方案1】:

尝试通过研究解决问题,因此中断。我的问题的可能解决方案是:

1) 一种热编码:基本上是一种将训练数据转换为简单因子/数字的再处理方法

2)参数输入法:

x<-data_prepro_maf_train[,c(1,3,5,6,7,8)]
y<-data_prepro_maf_train[,12]
model_list1<-caretList(x,y,data=data_prepro_maf_train,trControl = control,metric="Accuracy",methodList = class_model[1])

我将其更改为 y~X1+X2+X3 方法,至少现在 CaretList 正在运行一些模型 [公式公式-vs-非公式-interface-in-train1的讨论@

以下是所做的更改:

#Let’s one hot encode the data_prepro_maf_train data
dummy_model1<-dummyVars(title~.,data=data_prepro_maf_train[c(1,2,3,5,6,7,8,10)])

data_train_mat1<-predict(dummy_model1,newdata=data_prepro_maf_train)

data_prepro_maf_train2<-data.frame(data_train_mat1)

#Add back columns “title” and “Embarked”, which have vital factors for the model
data_prepro_maf_train2<-cbind(data_prepro_maf_train$Embarked,data_prepro_maf_train$title,data_prepro_maf_train2)

colnames(data_prepro_maf_train2)[1]<-"Embarked"
colnames(data_prepro_maf_train2)[2]<-"title"
#Adjust consistency of levels in the new train data. If the error below shows up, try running this code again before running model_list2 (not sure why it is not saved):
"Error: One or more factor levels in the outcome has no data: 'Q'"

levels(data_prepro_maf_train2$Embarked)<-droplevels(data_prepro_maf_train2$Embarked)

set.seed(123)
number<-3
repeats<-2
control<-trainControl(method="repeatedcv",number=number,repeats=repeats,classProbs = TRUE,savePredictions = "all",index=createResample(data_prepro_maf_train$Embarked,repeats*number),summaryFunction = multiClassSummary,allowParallel = TRUE)
#Since the class_model has over 100 models...let's select a few that we know for testing the previous error (I stumbled upon the “preProcess=c(“center”,”scale”) which said to help in my situation…not sure how it works and would appreciate if someone could explain it??  :
model_list2<-caretList(Embarked~title+Pclass+Age+Sex.male+Sex.female+SibSp+Parch,data=data_prepro_maf_train1,preProcess = c("center", "scale"),trControl = control,metric="Accuracy",methodList = class_model[c(37,52,55,68,102,145,167,189)])

不确定这是否是我的问题的结束......至少模型正在运行并且没有任何发现就停止

【讨论】:

    猜你喜欢
    • 2015-10-18
    • 1970-01-01
    • 2015-01-05
    • 2018-06-26
    • 2019-01-04
    • 2018-10-03
    • 2021-08-14
    • 2019-08-27
    • 1970-01-01
    相关资源
    最近更新 更多