【问题标题】:PCA preprocess parameter in caret's train function插入符号的训练函数中的 PCA 预处理参数
【发布时间】:2019-09-06 22:28:03
【问题描述】:

我正在对我的数据进行knn 回归,并希望:

a) 通过repeatedcv 进行交叉验证以找到最佳k

b) 在构建 knn 模型时,在90% 级别阈值处使用PCA 来降低维度。

library(caret)
library(dplyr)
set.seed(0)
data = cbind(rnorm(20, 100, 10), matrix(rnorm(400, 10, 5), ncol = 20)) %>% 
  data.frame()
colnames(data) = c('True', paste0('Day',1:20))
tr = data[1:15, ] #training set
tt = data[16:20,] #test set

train.control = trainControl(method = "repeatedcv", number = 5, repeats=3)
k = train(True ~ .,
          method     = "knn",
          tuneGrid   = expand.grid(k = 1:10), 
          #trying to find the optimal k from 1:10
          trControl  = train.control, 
          preProcess = c('scale','pca'),
          metric     = "RMSE",
          data       = tr)

我的问题:

(1) 我注意到someone 建议更改trainControl 中的pca 参数:

ctrl <- trainControl(preProcOptions = list(thresh = 0.8))
mod <- train(Class ~ ., data = Sonar, method = "pls",
              trControl = ctrl)

如果我更改trainControl中的参数,是否意味着PCA仍在KNN期间进行? Similar concern as this question

(2) 我找到了另一个适合我的情况的 example - 我希望将阈值更改为 90%,但我不知道在哪里可以在 Carettrain 函数中更改它,尤其是我仍然需要scale 选项。

对于我冗长的描述和随机引用,我深表歉意。先感谢您!

(感谢 Camille 提出的使代码正常工作的建议!)

【问题讨论】:

  • 没有大量使用 caret 的经验,但看起来 preProcess 应该是 train 的参数,而不是函数。将preProcess(c('scale','pca')) 更改为preProcess = c('scale','pca')

标签: r machine-learning pca cross-validation r-caret


【解决方案1】:

回答您的问题:

我注意到有人建议更改 pca 参数 火车控制:

mod <- train(Class ~ ., data = Sonar, method = "pls",trControl = ctrl)

如果我更改trainControl中的参数,是否意味着PCA 还在 KNN 期间进行?

是的,如果你这样做:

train.control = trainControl(method = "repeatedcv", number = 5, repeats=3,preProcOptions = list(thresh = 0.9))

k = train(True ~ .,
          method     = "knn",
          tuneGrid   = expand.grid(k = 1:10), 
          trControl  = train.control, 
          preProcess = c('scale','pca'),
          metric     = "RMSE",
          data       = tr)

你可以在 preProcess 下查看:

k$preProcess
Created from 15 samples and 20 variables

Pre-processing:
  - centered (20)
  - ignored (0)
  - principal component signal extraction (20)
  - scaled (20)

PCA needed 9 components to capture 90 percent of the variance

这将回答2)分别使用preProcess:

mdl = preProcess(tr[,-1],method=c("scale","pca"),thresh=0.9)
mdl
Created from 15 samples and 20 variables

Pre-processing:
  - centered (20)
  - ignored (0)
  - principal component signal extraction (20)
  - scaled (20)

PCA needed 9 components to capture 90 percent of the variance

train.control = trainControl(method = "repeatedcv", number = 5, repeats=3)

k = train(True ~ .,
          method     = "knn",
          tuneGrid   = expand.grid(k = 1:10), 
          trControl  = train.control,
          metric     = "RMSE",
          data       = predict(mdl,tr))

【讨论】:

    猜你喜欢
    • 2019-09-12
    • 1970-01-01
    • 2020-10-13
    • 2021-01-22
    • 1970-01-01
    • 1970-01-01
    • 2011-09-19
    • 1970-01-01
    • 2015-09-17
    相关资源
    最近更新 更多