【问题标题】:Error with cross validation and lasso regularization for logistic regression逻辑回归的交叉验证和套索正则化错误
【发布时间】:2020-09-27 18:30:32
【问题描述】:

我想使用 lasso 正则化创建一个 5 倍 CV 逻辑回归模型,但我收到以下错误消息:Something is wrong; all the RMSE metric values are missing:

我通过设置alpha=1 开始使用套索正则化逻辑回归。这行得通。我从this example 扩展。

# Load data set
data("mtcars")

# Prepare data set 
x   <- model.matrix(~.-1, data= mtcars[,-1])
mpg <- ifelse( mtcars$mpg < mean(mtcars$mpg), 0, 1)
y   <- factor(mpg, labels = c('notEfficient', 'efficient'))

#find minimum coefficient
mod_cv <- cv.glmnet(x=x, y=y, family='binomial', alpha=1)

#logistic regression with lasso regularization
logistic_model <- glmnet(x, y, alpha=1, family = "binomial",
                         lambda = mod_cv$lambda.min)

我读到glmnet 函数已经做了 10 倍 cv。但我想使用 5-fold cv。因此,当我使用n_folds 将修改添加到cv.glmnet 时,我无法找到最小系数,也无法在修改trControl 时制作模型。

#find minimum coefficient by adding 5-fold cv
mod_cv <- cv.glmnet(x=x, y=y, family='binomial', alpha=1, n_folds=5)


#Error in glmnet(x, y, weights = weights, offset = offset, #lambda = lambda,  : 
#  unused argument (n_folds = 5)

#logistic regression with 5-fold cv
    # define training control
    train_control <- trainControl(method = "cv", number = 5)

# train the model with 5-fold cv
model <- train(x, y, trControl = train_control, method = "glm", family="binomial", alpha=1)

#Something is wrong; all the Accuracy metric values are missing:
#    Accuracy       Kappa    
#Min.   : NA   Min.   : NA  
# 1st Qu.: NA   1st Qu.: NA  
# Median : NA   Median : NA  
# Mean   :NaN   Mean   :NaN  
# 3rd Qu.: NA   3rd Qu.: NA  
# Max.   : NA   Max.   : NA  
 # NA's   :1     NA's   :1  

为什么我添加 5-fold cv 时会出现错误?

【问题讨论】:

    标签: r logistic-regression cross-validation glmnet lasso-regression


    【解决方案1】:

    您的代码中有两个问题: 1) cv.glmnet 中的 n_folds 参数实际上称为 nfolds 和 2) train 函数不接受 alpha 参数。如果你修复了这些,你的代码就可以工作:

    # Load data set
    data("mtcars")
    library(glmnet)
    library(caret)
    
    # Prepare data set 
    x   <- model.matrix(~.-1, data= mtcars[,-1])
    mpg <- ifelse( mtcars$mpg < mean(mtcars$mpg), 0, 1)
    y   <- factor(mpg, labels = c('notEfficient', 'efficient'))
    
    #find minimum coefficient
    mod_cv <- cv.glmnet(x=x, y=y, family='binomial', alpha=1)
    
    #logistic regression with lasso regularization
    logistic_model <- glmnet(x, y, alpha=1, family = "binomial",
                             lambda = mod_cv$lambda.min)
    
    
    
    #find minimum coefficient by adding 5-fold cv
    mod_cv <- cv.glmnet(x=x, y=y, family='binomial', alpha=1, nfolds=5)
    
    
    #logistic regression with 5-fold cv
    # define training control
    train_control <- trainControl(method = "cv", number = 5)
    
    # train the model with 5-fold cv
    model <- train(x, y, trControl = train_control, method = "glm", family="binomial")
    model$results
    #>  parameter  Accuracy     Kappa AccuracySD   KappaSD
    #>1      none 0.8742857 0.7362213 0.07450517 0.1644257
    
    
    

    【讨论】:

    • 如果train 不能采用alpha,这是否意味着您不能进行套索正则化(在这种情况下,我使用它来配对我真实数据集中的数百个变量)与简历相加?所以我要么选择套索,要么选择简历?
    • 可以,像这样:model &lt;- train(x, y, trControl = train_control, method = "glmnet", family="binomial", tuneGrid = expand.grid(alpha = 1, lambda = 1))
    • 太棒了!多谢!我刚开始学习物流功能,这对我很有帮助! :D
    猜你喜欢
    • 1970-01-01
    • 2017-11-12
    • 1970-01-01
    • 2019-04-20
    • 2017-01-02
    • 2017-07-07
    • 2017-01-25
    • 2020-09-21
    • 2014-07-22
    相关资源
    最近更新 更多