【问题标题】:Imputation using MICE in mlr在 mlr 中使用 MICE 进行插补
【发布时间】:2020-06-16 07:07:46
【问题描述】:

我正在尝试在 mlr 中编写自己的插补方法,使用 makeImputeMethod 通过链式方程与 R 中的 mouse 包执行多重插补。我的 imputeMice() 方法运行到完成,但完成后出现以下错误:

Error in `[.data.frame`(data, ind) : undefined columns selected

我不知道为什么,也不知道它来自哪里。这是我写的代码:

library(survival)
#> Warning: package 'survival' was built under R version 3.6.3
library(mlr)
#> Warning: package 'mlr' was built under R version 3.6.3
#> Loading required package: ParamHelpers
#> Warning: package 'ParamHelpers' was built under R version 3.6.3
#> 'mlr' is in maintenance mode since July 2019. Future development
#> efforts will go into its successor 'mlr3' (<https://mlr3.mlr-org.com>).
library(lattice)
#> Warning: package 'lattice' was built under R version 3.6.3
library(mice)
#> Warning: package 'mice' was built under R version 3.6.3
#> 
#> Attaching package: 'mice'
#> The following objects are masked from 'package:base':
#> 
#>     cbind, rbind

data(pbc)
task_id = "PBC"
pbc[pbc$status == 2, "status"] = 1
pbc.task <- makeSurvTask(id = task_id, data = pbc, target = c("time", "status"))
outer = makeResampleDesc("CV", iters=2, stratify=TRUE)                              # Tuning: 5-fold CV, no repeats

imputeMice = function() {
  makeImputeMethod(
    learn = function(data, target, col) {
      return(list(values = data))
    },
    impute = function(data, target, col, values) {
      data = as.data.frame(data)
      excl = names(data)[ sapply(data, is.factor) ]
      predmat = mice::quickpred(data, minpuc=0, mincor=0, exclude=excl)
      imp_data = mice::mice(data, pred=predmat, seed = 23109, printFlag=FALSE)
      x = mice::complete(imp_data)
      print("Imputation completed")
      return(x)
    }
  )
}

lrn = makeFilterWrapper(
  makeLearner(cl="surv.coxph", id = "cox.filt", predict.type="response"), 
  fw.method="univariate.model.score",
  fw.perc=0.1,
  cache=TRUE
)
lrn = makeImputeWrapper(lrn, classes = list(numeric = imputeMice(), integer = imputeMice(), factor = imputeMice()))

res = resample(learner = lrn, task = pbc.task, resampling = outer, models = TRUE,
               measures = list(cindex), show.info = TRUE, extract = getFilteredFeatures)
#> Resampling: cross-validation
#> Measures:             cindex
#> [1] "Imputation completed"
#> [1] "Imputation completed"
#> [1] "Imputation completed"
#> [1] "Imputation completed"
#> [1] "Imputation completed"
#> [1] "Imputation completed"
#> [1] "Imputation completed"
#> [1] "Imputation completed"
#> [1] "Imputation completed"
#> [1] "Imputation completed"
#> [1] "Imputation completed"
#> [1] "Imputation completed"
#> [1] "Imputation completed"
#> [1] "Imputation completed"
#> [1] "Imputation completed"
#> [1] "Imputation completed"
#> [1] "Imputation completed"
#> [1] "Imputation completed"
#> Error in `[.data.frame`(data, ind): undefined columns selected

reprex package (v0.3.0) 于 2020 年 6 月 16 日创建

很明显,函数 imputeMice() 是在 data.frame pbc 的每一列上调用的。但是使用鼠标我们应该只需要调用这个函数一次,它会在每一列上执行插补。在 mlr 中可以吗?

【问题讨论】:

    标签: r mlr


    【解决方案1】:

    错误是我的——我应该在学习函数中调用老鼠,而不是在估算函数中。我发现这些函数的名称令人困惑。我的新代码如下,这是有效的。但它会在每一列上调用鼠标。我真的只需要调用一次。这可能吗?

    library(survival)
    #> Warning: package 'survival' was built under R version 3.6.3
    library(mlr)
    #> Warning: package 'mlr' was built under R version 3.6.3
    #> Loading required package: ParamHelpers
    #> Warning: package 'ParamHelpers' was built under R version 3.6.3
    #> 'mlr' is in maintenance mode since July 2019. Future development
    #> efforts will go into its successor 'mlr3' (<https://mlr3.mlr-org.com>).
    library(lattice)
    #> Warning: package 'lattice' was built under R version 3.6.3
    library(mice)
    #> Warning: package 'mice' was built under R version 3.6.3
    #> 
    #> Attaching package: 'mice'
    #> The following objects are masked from 'package:base':
    #> 
    #>     cbind, rbind
    
    data(pbc)
    task_id = "PBC"
    pbc[pbc$status == 2, "status"] = 1
    pbc.task <- makeSurvTask(id = task_id, data = pbc, target = c("time", "status"))
    outer = makeResampleDesc("CV", iters=2, stratify=TRUE)                              # Tuning: 5-fold CV, no repeats
    
    imputeMice = function() {
      makeImputeMethod(
        learn = function(data, target, col) {
          data = as.data.frame(data)
          excl = names(data)[ sapply(data, is.factor) ]
          predmat = mice::quickpred(data, minpuc=0, mincor=0, exclude=excl)
          imp_data = mice::mice(data, pred=predmat, seed = 23109, printFlag=FALSE)
          x = mice::complete(imp_data)
          return(list(values = x[[col]]))
        },
        impute = function(data, target, col, values) {
          data[[col]] = values
          return(data[[col]])
        }
      )
    }
    
    lrn = makeFilterWrapper(
      makeLearner(cl="surv.coxph", id = "cox.filt", predict.type="response"), 
      fw.method="univariate.model.score",
      fw.perc=0.1,
      cache=TRUE
    )
    lrn = makeImputeWrapper(lrn, classes = list(numeric = imputeMice(), integer = imputeMice(), factor = imputeMice()))
    
    res = resample(learner = lrn, task = pbc.task, resampling = outer, models = TRUE,
                   measures = list(cindex), show.info = TRUE, extract = getFilteredFeatures)
    #> Resampling: cross-validation
    #> Measures:             cindex
    #> [Resample] iter 1:    0.7069869
    #> [Resample] iter 2:    0.7138798
    #> 
    #> Aggregated Result: cindex.test.mean=0.7104333
    #> 
    

    reprex package (v0.3.0) 于 2020 年 6 月 19 日创建

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2016-11-19
      • 1970-01-01
      • 2022-11-03
      • 2020-08-10
      • 1970-01-01
      • 2021-09-28
      • 1970-01-01
      相关资源
      最近更新 更多