从训练有素的插入符号模型中提取 beta 值答案

【问题标题】：Extracting beta values from trained caret model从训练有素的插入符号模型中提取 beta 值
【发布时间】：2021-08-11 12:49:05
【问题描述】：

我正在尝试从使用 caret 包中的 train() 确定的模型中提取 beta 值。

cv_model_pls <- train(
  POD1HemoglobinCut ~ ., 
  data = train, 
  method = "pls",
  family = "binomial",
  trControl = trainControl(method = "cv", number = 10),
  preProcess = c("zv", "center", "scale"),
  tuneLength = 6
)

输出是：

> cv_model_pls
Partial Least Squares 

9932 samples
   7 predictor
   2 classes: '[0,10)', '[10,Inf)' 

Pre-processing: centered (7), scaled (7) 
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 8939, 8939, 8939, 8938, 8940, 8939, ... 
Resampling results across tuning parameters:

  ncomp  Accuracy   Kappa    
  1      0.8569258  0.1994938
  2      0.8698149  0.3215483
  3      0.8707213  0.3303433
  4      0.8710237  0.3335666
  5      0.8710238  0.3341072
  6      0.8708224  0.3330295

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was ncomp = 5.

运行摘要以尝试获取 beta 值让我很兴奋：

> summary(cv_model_pls)
Data:   X dimension: 9932 7 
    Y dimension: 9932 2
Fit method: oscorespls
Number of components considered: 5
TRAINING: % variance explained
Error in dimnames(tbl) <- list(c("X", yvarnames), paste(1:object$ncomp,  : 
  length of 'dimnames' [1] not equal to array extent

如何提取优化模型（或其他模型）的 beta 值？
如何通过最大化灵敏度（而不是默认精度）来选择模型？

【问题讨论】：

标签： r machine-learning logistic-regression r-caret

【解决方案1】：

对于 beta 值，我猜你指的是系数。汇总函数从pls 调用pls:::summary.mvr 仅返回解释的方差。你可以做?pls:::summary.mvr 看看这是做什么的。它不适用于plsda 的输出。

使用示例数据集，我们使用插入符号进行拟合：

set.seed(111)
df = MASS::Pima.tr

cv_model_pls <- train(type~.,data=df,method="pls",
family="binomial",trControl = trainControl(method = "cv", number = 5),
preProcess = c("center", "scale"),
tuneLength = 6
 )

结果：

Partial Least Squares 

200 samples
  7 predictor
  2 classes: 'No', 'Yes' 

Pre-processing: centered (7), scaled (7) 
Resampling: Cross-Validated (5 fold) 
Summary of sample sizes: 159, 161, 159, 161, 160 
Resampling results across tuning parameters:

  ncomp  Accuracy   Kappa    
  1      0.7301063  0.3746033
  2      0.7504909  0.4255505
  3      0.7453627  0.4140426
  4      0.7553690  0.4412532
  5      0.7502408  0.4275158
  6      0.7502408  0.4275158

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was ncomp = 4.

你可以找到最终拟合模型下的系数：

cv_model_pls$finalModel$coefficients

它会显示最好的 n 台 PC 的组件，所以在这个例子中，这样做：

cv_model_pls$finalModel$coefficients[,,cv_model_pls$bestTune$ncomp]
                No          Yes
npreg -0.060740474  0.060740474
glu   -0.173639051  0.173639051
bp     0.006635470 -0.006635470
skin  -0.002510842  0.002510842
bmi   -0.065740864  0.065740864
ped   -0.086110972  0.086110972
age   -0.076374824  0.076374824

对于敏感度，在trainControl 中使用summaryFunction = twoClassSummary 并将度量设置为Sens：

model <- train(type~.,data=df,method="pls",
    family="binomial",
    trControl = trainControl(method = "cv", 
    summaryFunction = twoClassSummary,
    classProbs = TRUE,
    number = 5),
    metric = "Sens",
    preProcess = c("center", "scale"),
    tuneLength = 6
     )

Partial Least Squares 

200 samples
  7 predictor
  2 classes: 'No', 'Yes' 

Pre-processing: centered (7), scaled (7) 
Resampling: Cross-Validated (5 fold) 
Summary of sample sizes: 159, 161, 161, 159, 160 
Resampling results across tuning parameters:

  ncomp  ROC        Sens       Spec     
  1      0.8227357  0.8635328  0.5571429
  2      0.8286638  0.8555556  0.5428571
  3      0.8250728  0.8709402  0.5571429
  4      0.8247738  0.8555556  0.5571429
  5      0.8264237  0.8555556  0.5428571
  6      0.8258946  0.8632479  0.5428571

Sens was used to select the optimal model using the largest value.
The final value used for the model was ncomp = 3.

【讨论】：

这是一个很好的回应。太感谢了。我希望从我的模型中创建一个方程，其形式如本手稿框 1 所示：bmj.com/content/bmj/351/bmj.h3868.full.pdf。您的系数是否可以在“是”列下列出？
应该是这样的
没有拦截？