【问题标题】:R: Feature Selection with Cross Validation using Caret on Logistic RegressionR:在 Logistic 回归上使用 Caret 进行交叉验证的特征选择
【发布时间】:2017-07-07 23:29:08
【问题描述】:

我目前正在学习如何在 R 中实现逻辑回归

我已经获取了一个数据集并将其拆分为训练和测试集,并希望使用交叉验证来实现 forward selectionbackward selectionbest subset selection 以选择最佳特征。 我正在使用caret 在训练数据集上实现cross-validation,然后在测试数据上测试预测。

我在插入符号中看到了rfe 控件,还查看了caret website 上的文档以及问题How to use wrapper feature selection with algorithms in R? 上的链接。我不清楚如何更改特征选择的类型,因为它似乎默认为向后选择。任何人都可以帮助我完成我的工作流程。下面是一个可重现的例子

library("caret")

# Create an Example Dataset from German Credit Card Dataset
mydf <- GermanCredit

# Create Train and Test Sets 80/20 split
trainIndex <- createDataPartition(mydf$Class, p = .8, 
                              list = FALSE, 
                              times = 1)

train <- mydf[ trainIndex,]
test  <- mydf[-trainIndex,]


ctrl <- trainControl(method = "repeatedcv", 
                 number = 10, 
                 savePredictions = TRUE)

mod_fit <- train(Class~., data=train, 
             method="glm", 
             family="binomial",
             trControl = ctrl, 
             tuneLength = 5)


# Check out Variable Importance
varImp(mod_fit)
summary(mod_fit)

# Test the new model on new and unseen Data for reproducibility
pred = predict(mod_fit, newdata=test)
accuracy <- table(pred, test$Class)
sum(diag(accuracy))/sum(accuracy)

【问题讨论】:

标签: r r-caret cross-validation feature-extraction


【解决方案1】:

你可以简单地在 mod_fit 中调用它。当涉及到逐步后退时,下面的代码就足够了

trControl <- trainControl(method="cv",
                          number = 5,
                          savePredictions = T,
                          classProbs = T,
                          summaryFunction = twoClassSummary)

caret_model <- train(Class~.,
                     train,
                     method="glmStepAIC", # This method fits best model stepwise.
                     family="binomial",
                     direction="backward", # Direction
                     trControl=trControl)

注意在trControl中

method= "cv", # No need to call repeated here, the number defined afterward defines the k-fold.
classProbs = T,
summaryFunction = twoClassSummary # Gives back ROC, sensitivity and specifity of the chosen model.

【讨论】:

    猜你喜欢
    • 2016-06-27
    • 2020-05-13
    • 2015-11-13
    • 2013-11-08
    • 1970-01-01
    • 1970-01-01
    • 2017-03-30
    • 2020-07-10
    • 2016-01-25
    相关资源
    最近更新 更多