mlr 包可以用于根据面板研究的数据进行预测吗？答案

【问题标题】：Can the mlr package be used to make predictions based on data from a panel study?mlr 包可以用于根据面板研究的数据进行预测吗？
【发布时间】：2022-10-10 22:47:37
【问题描述】：

我打算做一个受监督的机器学习项目，我使用纵向研究（小组研究）的数据。目标是使用 2004 年和 2009 年的预测变量来预测 2014 年的结果。我现在已经完成了第一次数据预处理，数据框看起来像以下高度缩写的形式：

data_ml <- structure(
  list(
    ID = c(
      201,
      203,
      602,
      901,
      1202,
      1501,
      1601,
      1602,
      1603,
      201,
      203,
      602,
      901,
      1202,
      1501,
      1601,
      1602,
      1603,
      201,
      203,
      602,
      901,
      1202,
      1501,
      1601,
      1602,
      1603
    ),
    Studyyear = c(
      2004,
      2004,
      2004,
      2004,
      2004,
      2004,
      2004,
      2004,
      2004,
      2009,
      2009,
      2009,
      2009,
      2009,
      2009,
      2009,
      2009,
      2009,
      2014,
      2014,
      2014,
      2014,
      2014,
      2014,
      2014,
      2014,
      2014
    ),
    Gender = c(2, 1, 2, 2, 2, 1, 1, 2, 1,
               2, 1, 2, 2, 2, 1, 1, 2, 1, 2, 1, 2, 2, 2, 1, 1, 2, 1),
    Predictor1 = c(6,
                   5, 4, 6, 4, 6, 4, 3, 3, 6, 5, 4, 6, 4, 6, 4, 3, 3, 6, 5, 4, 6,
                   4, 6, 4, 3, 3),
    Predictor2 = c(2, 2, 1, 1, 2, 2, 1, 2, 2, 2,
                   2, 1, 1, 2, 2, 1, 2, 2, 2, 2, 1, 1, 2, 2, 1, 2, 2),
    Predictor3 = c(0,
                   6, 1, 6, 0, 0, 4, 2, 3, 0, 6, 1, 6, 0, 0, 4, 1, 1, 1, 6, 1, 6,
                   0, 0, 4, 1, 1),
    Outcome1 = c(0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1,
                 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1),
    Outcome2 = c(0,
                 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0,
                 1, 0, 1, 1, 0)
  ),
  class = c("tbl_df", "tbl", "data.frame"),
  row.names = c(NA,-27L)
)

到目前为止，我的预测项目不包括时间维度（参见 data_ml: "Studyyear"）。所以我可以创建一个任务，然后使用“mlr”包进行预测，如下所示：

library(mlr)
task <- makeClassifTask(data = data_ml, target = 'Outcome1', positive = '1')
measures = list(acc, auc, tpr, tnr, f1)
resampling_MC <- makeResampleDesc(method = 'Subsample', iters = 500) 
learner_logreg <- makeLearner('classif.logreg', predict.type = 'prob')
benchmark_MC <- benchmark(learners = learner_logreg, tasks = task, resamplings = resampling_MC, measures = measures)

是否仍然可以使用具有上述数据框的“mlr”包并包含时间维度？

【问题讨论】：

标签： r supervised-learning mlr longitudinal

【解决方案1】：

是的，您可以使用 mlr3forecasting package 执行此操作，请参阅示例 here。该软件包尚未在 CRAN 上发布，并且仍处于试验阶段，因此您必须从 Github（软件包网站上的说明）安装它，并且可能无法按预期工作。

【讨论】：