【问题标题】:Extracting beta values from trained caret model从训练有素的插入符号模型中提取 beta 值
【发布时间】:2021-08-11 12:49:05
【问题描述】:

我正在尝试从使用 caret 包中的 train() 确定的模型中提取 beta 值。

cv_model_pls <- train(
  POD1HemoglobinCut ~ ., 
  data = train, 
  method = "pls",
  family = "binomial",
  trControl = trainControl(method = "cv", number = 10),
  preProcess = c("zv", "center", "scale"),
  tuneLength = 6
)

输出是:

> cv_model_pls
Partial Least Squares 

9932 samples
   7 predictor
   2 classes: '[0,10)', '[10,Inf)' 

Pre-processing: centered (7), scaled (7) 
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 8939, 8939, 8939, 8938, 8940, 8939, ... 
Resampling results across tuning parameters:

  ncomp  Accuracy   Kappa    
  1      0.8569258  0.1994938
  2      0.8698149  0.3215483
  3      0.8707213  0.3303433
  4      0.8710237  0.3335666
  5      0.8710238  0.3341072
  6      0.8708224  0.3330295

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was ncomp = 5.

运行摘要以尝试获取 beta 值让我很兴奋:

> summary(cv_model_pls)
Data:   X dimension: 9932 7 
    Y dimension: 9932 2
Fit method: oscorespls
Number of components considered: 5
TRAINING: % variance explained
Error in dimnames(tbl) <- list(c("X", yvarnames), paste(1:object$ncomp,  : 
  length of 'dimnames' [1] not equal to array extent
  1. 如何提取优化模型(或其他模型)的 beta 值?
  2. 如何通过最大化灵敏度(而不是默认精度)来选择模型?

【问题讨论】:

    标签: r machine-learning logistic-regression r-caret


    【解决方案1】:

    对于 beta 值,我猜你指的是系数。汇总函数从pls 调用pls:::summary.mvr 仅返回解释的方差。你可以做?pls:::summary.mvr 看看这是做什么的。它不适用于plsda 的输出。

    使用示例数据集,我们使用插入符号进行拟合:

    set.seed(111)
    df = MASS::Pima.tr
    
    cv_model_pls <- train(type~.,data=df,method="pls",
    family="binomial",trControl = trainControl(method = "cv", number = 5),
    preProcess = c("center", "scale"),
    tuneLength = 6
     )
    

    结果:

    Partial Least Squares 
    
    200 samples
      7 predictor
      2 classes: 'No', 'Yes' 
    
    Pre-processing: centered (7), scaled (7) 
    Resampling: Cross-Validated (5 fold) 
    Summary of sample sizes: 159, 161, 159, 161, 160 
    Resampling results across tuning parameters:
    
      ncomp  Accuracy   Kappa    
      1      0.7301063  0.3746033
      2      0.7504909  0.4255505
      3      0.7453627  0.4140426
      4      0.7553690  0.4412532
      5      0.7502408  0.4275158
      6      0.7502408  0.4275158
    
    Accuracy was used to select the optimal model using the largest value.
    The final value used for the model was ncomp = 4.
    

    你可以找到最终拟合模型下的系数:

    cv_model_pls$finalModel$coefficients
    

    它会显示最好的 n 台 PC 的组件,所以在这个例子中,这样做:

    cv_model_pls$finalModel$coefficients[,,cv_model_pls$bestTune$ncomp]
                    No          Yes
    npreg -0.060740474  0.060740474
    glu   -0.173639051  0.173639051
    bp     0.006635470 -0.006635470
    skin  -0.002510842  0.002510842
    bmi   -0.065740864  0.065740864
    ped   -0.086110972  0.086110972
    age   -0.076374824  0.076374824
    

    对于敏感度,在trainControl 中使用summaryFunction = twoClassSummary 并将度量设置为Sens

    model <- train(type~.,data=df,method="pls",
        family="binomial",
        trControl = trainControl(method = "cv", 
        summaryFunction = twoClassSummary,
        classProbs = TRUE,
        number = 5),
        metric = "Sens",
        preProcess = c("center", "scale"),
        tuneLength = 6
         )
    
    Partial Least Squares 
    
    200 samples
      7 predictor
      2 classes: 'No', 'Yes' 
    
    Pre-processing: centered (7), scaled (7) 
    Resampling: Cross-Validated (5 fold) 
    Summary of sample sizes: 159, 161, 161, 159, 160 
    Resampling results across tuning parameters:
    
      ncomp  ROC        Sens       Spec     
      1      0.8227357  0.8635328  0.5571429
      2      0.8286638  0.8555556  0.5428571
      3      0.8250728  0.8709402  0.5571429
      4      0.8247738  0.8555556  0.5571429
      5      0.8264237  0.8555556  0.5428571
      6      0.8258946  0.8632479  0.5428571
    
    Sens was used to select the optimal model using the largest value.
    The final value used for the model was ncomp = 3.
    

    【讨论】:

    • 这是一个很好的回应。太感谢了。我希望从我的模型中创建一个方程,其形式如本手稿框 1 所示:bmj.com/content/bmj/351/bmj.h3868.full.pdf。您的系数是否可以在“是”列下列出?
    • 应该是这样的
    • 没有拦截?
    猜你喜欢
    • 2018-02-06
    • 2015-12-30
    • 2021-01-22
    • 2018-06-16
    • 2018-05-28
    • 2019-10-01
    • 2019-08-30
    • 2018-08-30
    • 2018-06-24
    相关资源
    最近更新 更多