【发布时间】:2021-06-30 17:23:33
【问题描述】:
最近我跟着一些教程学习如何在 mlr3 中使用 GraphLearner。但是我仍然对 GraphLearner with branch 的调优结果感到困惑。我设置了一个简单的例子,这是我的代码:
# Create a Graph with Branch
graph_branch <-
po("branch", c("nop", "pca", "scale"), id = "preprocess_branch") %>>%
gunion(list(
po("nop"),
po("pca", id = "pca1"),
po("scale") %>>% po("pca", id = "pca2")
)) %>>%
po("unbranch", id = "preprocess_unbranch") %>>%
po("branch", c("classif.kknn", "classif.featureless"), id = "lrn_branch") %>>%
gunion(list(
lrn("classif.kknn", predict_type = "prob"),
lrn("classif.featureless", predict_type = "prob")
)) %>>%
po("unbranch", id = "lrn_unbranch")
# Convert a graph to a learner
graph_branch_lrn <- as_learner(graph_branch)
graph_branch_lrn$graph$plot()
# Set the tuning grid
tune_grid <- ParamSet$new(
list(
ParamFct$new("preprocess_branch.selection", levels = c("nop", "pca", "scale")),
ParamInt$new("pca1.rank.", lower = 1, upper = 10),
ParamInt$new("pca2.rank.", lower = 1, upper = 10),
ParamFct$new("lrn_branch.selection", levels = c("classif.kknn", "classif.featureless")),
ParamInt$new("classif.kknn.k", lower = 1, upper = 10)
))
# Set the instance
instance_rs <- TuningInstanceSingleCrit$new(
task = task_train,
learner = graph_branch_lrn,
resampling = rsmp("cv", folds = 5),
measure = msr("classif.auc"),
search_space = tune_grid,
terminator = trm("evals", n_evals = 20)
)
# Random search tuning
tuner_rs <- tnr("random_search")
plan(multisession, workers = 5)
set.seed(100)
tuner_rs$optimize(instance_rs)
plan(sequential)
最好的调优结果是:
# Check the result
instance_rs$result_learner_param_vals
$preprocess_branch.selection
[1] "nop"
$scale.robust
[1] FALSE
$lrn_branch.selection
[1] "classif.kknn"
$classif.featureless.method
[1] "mode"
$pca1.rank.
[1] 9
$pca2.rank.
[1] 9
$classif.kknn.k
[1] 9
我想知道如果选择“nop”分支,为什么“pca1.rank”的调优结果。和“pca2.rank”。出现?我曾经考虑过,GraphLearner with branch 的调优会根据分支选择最好的结果,比如如果选择了“nop”分支,其他分支中的参数就不会考虑并出现在过程中。我是在解释 GraphLearner 调优的机制时出了什么问题,还是在代码中出了什么问题?
【问题讨论】:
标签: r machine-learning mlr3