【发布时间】:2019-09-12 16:24:45
【问题描述】:
我正在对我的数据进行knn 回归,并希望:
a) 通过repeatcv 进行交叉验证以找到最优k;
b) 在构建knn 模型时,使用PCA 在90% 的水平阈值下进行降维。
library(caret)
library(dplyr)
set.seed(0)
data = cbind(rnorm(15, 100, 10), matrix(rnorm(300, 10, 5), ncol = 20)) %>%
data.frame()
colnames(data) = c('True', paste0('Day',1:20))
tr = data[1:10, ] #training set
tt = data[11:15,] #test set
train.control = trainControl(method = "repeatedcv", number = 5, repeats=3)
k = train(True ~ .,
method = "knn",
tuneGrid = expand.grid(k = 1:10),
trControl = train.control,
preProcess = c('scale','pca'),
metric = "RMSE",
data = tr)
我的问题是:目前 PCA 阈值默认为 95%(不确定),如何将其更改为 80%?
【问题讨论】:
-
也许您可能想使用
method="adaptive_cv",您可以在adaptive=list(min=5, alpha=0.20, method="gls", complete=TRUE)中设置alpha? -
杰,谢谢你的建议!看来您建议我更改交叉验证参数,但是,是否可以更改
pca的选项? -
我看到了这个链接stats.stackexchange.com/a/46256/244949,这个解决方案似乎有效。比如
tc = trainControl(method = "cv", preProcOptions = list(thresh=0.8)),那么在knn模型中,`train(y, method="knn", trControl = tc, preProcess= c("scale", "center", "pca")). Do you think this will mix up the parameters since have threepreProcess` 选项?
标签: r machine-learning pca knn feature-selection