【发布时间】:2021-05-21 09:06:41
【问题描述】:
经过一些预处理(包括OneHotEncoding.
问题:当我使用 sizes = c(15) 运行 rfe 时,它会产生 15 和 63 Variable 结果。由于 63 变量的准确度略高,因此默认选择 63 Variable 结果。
想要而不是 63 获得 前 15 个变量,因为结果差异很小,但计算成本会更低。 p>
阅读下面的帖子后,我意识到我可以使用optVariables[1:15]
retrieve selected variables from caret recursive feature elimination (rfe) results
疑问:如果我使用 RFE_single_size$optVariables[1:15] 是从 63 返回的变量集中选择 top 15 vars 还是 15 Variables ?
control <- rfeControl(functions = rfFuncs, method = "cv", verbose = FALSE)
system.time(
RFE_single_size <- rfe(x = train_both_sample, # selected_vars[, 1:44]
y = pull(Y_train),
sizes = c(15),
rfeControl = control
)
)
RFE_single_size
RFE 结果
Recursive feature selection
Outer resampling method: Cross-Validated (10 fold)
Resampling performance over subset size:
Variables Accuracy Kappa AccuracySD KappaSD Selected
15 0.9646 0.9293 0.007279 0.01451
63 0.9702 0.9404 0.006592 0.01315 *
The top 5 variables (out of 63):
duration, age, campaign, euribor3m, nr.employed
我想将选择从 63 更改为 15 Variables 以确保我从 top 15 中选择 strong>15 Variables 已返回。
关于数据:数据取自开源“银行营销响应”分类问题。
更新:为代码(rmd)和数据 csv 文件添加了 github 链接:https://github.com/johnsnow09/RFE
str(train_both_sample)
'data.frame': 2884 obs. of 63 variables:
$ age : num 31 45 33 47 30 43 23 42 43 37 ...
$ job.admin. : num 0 0 0 0 0 0 0 0 0 1 ...
$ job.blue.collar : num 1 0 0 1 0 1 0 0 1 0 ...
$ job.entrepreneur : num 0 0 0 0 0 0 1 0 0 0 ...
$ job.housemaid : num 0 0 0 0 0 0 0 0 0 0 ...
$ job.management : num 0 0 0 0 0 0 0 1 0 0 ...
$ job.retired : num 0 0 0 0 0 0 0 0 0 0 ...
$ job.self.employed : num 0 0 0 0 0 0 0 0 0 0 ...
$ job.services : num 0 1 0 0 0 0 0 0 0 0 ...
$ job.student : num 0 0 0 0 0 0 0 0 0 0 ...
$ job.technician : num 0 0 1 0 1 0 0 0 0 0 ...
$ job.unemployed : num 0 0 0 0 0 0 0 0 0 0 ...
$ job.unknown : num 0 0 0 0 0 0 0 0 0 0 ...
$ marital.divorced : num 0 0 0 0 0 0 0 0 0 0 ...
$ marital.married : num 1 1 1 1 0 1 1 1 1 1 ...
$ marital.single : num 0 0 0 0 1 0 0 0 0 0 ...
$ marital.unknown : num 0 0 0 0 0 0 0 0 0 0 ...
$ education.basic.4y : num 0 0 0 0 0 0 0 0 0 0 ...
$ education.basic.6y : num 0 0 0 1 0 0 0 0 0 0 ...
$ education.basic.9y : num 1 0 0 0 0 0 0 0 1 0 ...
$ education.high.school : num 0 1 0 0 0 1 0 0 0 1 ...
$ education.illiterate : num 0 0 0 0 0 0 0 0 0 0 ...
$ education.professional.course: num 0 0 1 0 1 0 0 0 0 0 ...
$ education.university.degree : num 0 0 0 0 0 0 1 1 0 0 ...
$ education.unknown : num 0 0 0 0 0 0 0 0 0 0 ...
$ default.no : num 1 1 1 0 1 1 1 1 1 0 ...
$ default.unknown : num 0 0 0 1 0 0 0 0 0 1 ...
$ default.yes : num 0 0 0 0 0 0 0 0 0 0 ...
$ housing.no : num 0 0 1 0 1 0 0 0 1 0 ...
$ housing.unknown : num 0 0 0 0 0 0 0 0 0 0 ...
$ housing.yes : num 1 1 0 1 0 1 1 1 0 1 ...
$ loan.no : num 1 1 1 1 1 1 1 0 1 1 ...
$ loan.unknown : num 0 0 0 0 0 0 0 0 0 0 ...
$ loan.yes : num 0 0 0 0 0 0 0 1 0 0 ...
$ contact.cellular : num 0 0 1 1 1 1 1 0 0 1 ...
$ contact.telephone : num 1 1 0 0 0 0 0 1 1 0 ...
$ month.Mar : num 0 0 0 0 0 0 0 0 0 0 ...
$ month.Apr : num 0 0 0 0 0 0 0 0 0 0 ...
$ month.May : num 1 0 0 1 0 0 0 1 0 0 ...
$ month.Jun : num 0 0 0 0 0 0 0 0 1 0 ...
$ month.Jul : num 0 1 0 0 1 0 0 0 0 1 ...
$ month.Aug : num 0 0 1 0 0 0 1 0 0 0 ...
$ month.Sep : num 0 0 0 0 0 0 0 0 0 0 ...
$ month.Oct : num 0 0 0 0 0 0 0 0 0 0 ...
$ month.Nov : num 0 0 0 0 0 1 0 0 0 0 ...
$ month.Dec : num 0 0 0 0 0 0 0 0 0 0 ...
$ day_of_week.fri : num 0 0 1 0 0 0 0 1 0 0 ...
$ day_of_week.mon : num 0 0 0 1 0 1 0 0 0 1 ...
$ day_of_week.thu : num 0 0 0 0 0 0 1 0 1 0 ...
$ day_of_week.tue : num 1 1 0 0 1 0 0 0 0 0 ...
$ day_of_week.wed : num 0 0 0 0 0 0 0 0 0 0 ...
$ duration : num 97 68 335 208 136 107 87 123 246 204 ...
$ campaign : num 2 4 3 4 2 2 1 1 2 3 ...
$ pdays : num 999 999 999 999 999 999 999 999 999 999 ...
$ previous : num 0 0 0 1 0 0 0 0 0 0 ...
$ poutcome.failure : num 0 0 0 1 0 0 0 0 0 0 ...
$ poutcome.nonexistent : num 1 1 1 0 1 1 1 1 1 1 ...
$ poutcome.success : num 0 0 0 0 0 0 0 0 0 0 ...
$ emp.var.rate : num 1.1 1.4 1.4 -1.8 1.4 -0.1 1.4 1.1 1.4 1.4 ...
$ cons.price.idx : num 94 93.9 93.4 92.9 93.9 ...
$ cons.conf.idx : num -36.4 -42.7 -36.1 -46.2 -42.7 -42 -36.1 -36.4 -41.8 -42.7 ...
$ euribor3m : num 4.86 4.96 4.97 1.3 4.96 ...
【问题讨论】:
标签: r classification r-caret rfe