【问题标题】:machine learning using R and randomForestSRC package使用 R 和 randomForestSRC 包进行机器学习
【发布时间】:2017-06-12 09:20:36
【问题描述】:

我正在尝试使用“surv.randomForestSRC”作为 R 中机器学习的学习器。 我的代码和结果如下。 “newHCC”是多个数值参数结果的HCC患者生存数据。

> newHCC$status = (newHCC$status == 1)
> surv.task = makeSurvTask(data = newHCC, target = c("time", "status"))
> surv.task
Supervised task: newHCC
Type: surv
Target: time,status
Events: 61
Observations: 127
Features:
numerics  factors  ordered
      30        0        0
Missings: FALSE
Has weights: FALSE
Has blocking: FALSE

> lrn = makeLearner("surv.randomForestSRC")
> rdesc = makeResampleDesc(method = "RepCV", folds=10, reps=10)
> r = resample(learner = lrn, task = surv.task, resampling = rdesc)
[Resample] repeated cross-validation iter 1: cindex.test.mean=0.485
[Resample] repeated cross-validation iter 2: cindex.test.mean=0.556
[Resample] repeated cross-validation iter 3: cindex.test.mean=0.825
[Resample] repeated cross-validation iter 4: cindex.test.mean=0.81
...
[Resample] repeated cross-validation iter 100: cindex.test.mean=0.683
[Resample] Aggr. Result: cindex.test.mean=0.688

我有几个问题。

  1. 如何查看使用的ntree、mtry等参数?
  2. 有什么好办法调优吗?
  3. 我如何查看预测的个人风险,例如我们在使用 randomForestSRC 包的predicted 时可以看到的内容?

非常感谢。

【问题讨论】:

标签: machine-learning random-forest survival-analysis


【解决方案1】:
  1. 和2.你可以尝试如下

    surv_param <- makeParamSet( makeIntegerParam("ntree",lower = 50, upper = 100), makeIntegerParam("mtry", lower = 1, upper = 6), makeIntegerParam("nodesize", lower = 10, upper = 50), makeIntegerParam("nsplit", lower = 3, upper = 50) ) rancontrol <- makeTuneControlRandom(maxit = 10L) surv_tune <- tuneParams(learner = lrn, resampling = rdesc, task = surv.task, par.set = surv_param, control = rancontrol) surv.tree <- setHyperPars(lrn, par.vals = surv_tune$x) surv <- mlr::train(surv.tree, surv.task) getLearnerModel(surva) model <- predict(surv, surv.task)

  2. 今天您无法预测 mlr surv.randomForestSRC 中的个人风险。只是预测类型响应

【讨论】: