插入符号（R）中的summary（）和print（）有什么区别答案

【问题标题】：What is the difference between the summary() and print() in caret (R)插入符号（R）中的summary（）和print（）有什么区别
【发布时间】：2020-06-02 21:50:07
【问题描述】：

在 R 的 caret 包中建模的上下文中，summary() 和 print() 函数有什么区别？对于这个有 4 个分量 28.52% 或 21.4% 的模型，这里解释的方差到底是什么？

> summary(model)
Data:   X dimension: 261 130 
    Y dimension: 261 1
Fit method: oscorespls
Number of components considered: 4
TRAINING: % variance explained
          1 comps  2 comps  3 comps  4 comps
X         90.1526    92.91    94.86    96.10
.outcome   0.8772    17.17    23.99    28.52

对

> print(model)
Partial Least Squares 

261 samples
130 predictors

No pre-processing
Resampling: Cross-Validated (5 fold, repeated 50 times) 
Summary of sample sizes: 209, 209, 209, 208, 209, 209, ... 
Resampling results across tuning parameters:

  ncomp  RMSE      Rsquared    MAE     
  1      5.408986  0.03144022  4.129525
  2      5.124799  0.14263362  3.839493
  3      4.976591  0.19114791  3.809596
  4      4.935419  0.21415260  3.799365
  5      5.054086  0.19887704  3.886382

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was ncomp = 4.

【问题讨论】：

看看源代码。例如，如果您正在运行灵活的判别分析，那么您可以比较 caret:::print.bagFDA 和 caret:::summary.bagFDA 以了解各自的不同之处。

标签： r r-caret

【解决方案1】：

有两个组成部分，第一个是您拟合/训练的模型类型，因为您使用了偏最小二乘回归，summary(model) 会返回有关插入符号选择的最佳模型的信息。

library(caret)
library(pls)

model = train(mpg ~ .,data=mtcars,
trControl=trainControl(method="cv",number=5),
method="pls")

Partial Least Squares 

32 samples
10 predictors

No pre-processing
Resampling: Cross-Validated (5 fold) 
Summary of sample sizes: 25, 27, 26, 24, 26 
Resampling results across tuning parameters:

  ncomp  RMSE      Rsquared   MAE     
  1      3.086051  0.8252487  2.571524
  2      3.129871  0.8122175  2.650973
  3      3.014511  0.8582197  2.519962

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was ncomp = 3.

当您执行print(model) 时，您正在查看训练模型并选择最佳参数的结果。使用请，您正在选择组件的数量，这是来自 caret ，并且对于其他方法可能看起来相同。在上面，测试了具有 1、2、3 个分量的模型，并选择了具有 3 个分量的模型，因为它的 RMSE 最小。最终存储的模型是在model$finalModel下你可以看一下：

class(model$finalModel)
[1] "mvr"

pls:::summary.mvr(model$finalModel)
Data:   X dimension: 32 10 
    Y dimension: 32 1
Fit method: oscorespls
Number of components considered: 3
TRAINING: % variance explained
          1 comps  2 comps  3 comps
X           92.73    99.98    99.99
.outcome    74.54    74.84    83.22

从上面，你可以看到调用的汇总函数来自包 pls 并且特定于这种类型的模型，下面的 summary(model) 给你同样的输出：

summary(model)
Data:   X dimension: 32 10 
    Y dimension: 32 1
Fit method: oscorespls
Number of components considered: 3
TRAINING: % variance explained
          1 comps  2 comps  3 comps
X           92.73    99.98    99.99
.outcome    74.54    74.84    83.22

partial least sqaure regression 类似于主成分分析，只是分解（或降维）是在 tranpose(X) * Y 上完成的，并且这些成分称为潜在变量。因此，总而言之，您看到的是 X（您的所有预测变量）和 .outcome（您的因变量）中由潜在变量解释的方差比例。

【讨论】：

感谢您的解释。所以基本上 summary() 为我提供了 pls 模型，而没有通过 caret 包调用的重新采样？
是的或多或少。在您的示例中，它是适合完整训练数据的 e 模型 4 组件