【问题标题】:code for h2o ensemble implementation in r for regression in rr中的h2o集成实现代码,用于r中的回归
【发布时间】:2018-07-07 12:53:18
【问题描述】:

我已经搜索了不同的门户网站,甚至在 h2o 集成文档中,我得到的只是分类问题二进制的集成示例,但没有一个示例显示如何为 r 中的简单回归问题实现一般堆叠或 h2o 集成

我请求任何人分享有关如何实现 h2o 集成或堆叠仅用于 R 中的回归问题的工作代码

简单的集成仅用于 R 中的回归。

只想知道如何为不同权重的回归实现集成/堆叠。

【问题讨论】:

  • 回归的实现是相同的,您只需使用带有数字响应而不是因子响应的数据集。请参阅下面 Lauren 的示例。

标签: r h2o ensemble-learning


【解决方案1】:

这是构建R:

中的回归问题(预测年龄)的堆叠合奏的示例
library('h2o')
h2o.init()

files3 = "http://h2o-public-test-data.s3.amazonaws.com/smalldata/prostate/prostate.csv"
col_types <- c("Numeric","Numeric","Numeric","Enum","Enum","Numeric","Numeric","Numeric","Numeric")
dat <- h2o.importFile(files3,destination_frame = "prostate.hex",col.types = col_types)
ss <- h2o.splitFrame(dat, ratios = 0.8, seed = 1)
train <- ss[[1]]
test <- ss[[2]]

x <- c("CAPSULE","GLEASON","RACE","DPROS","DCAPS","PSA","VOL")
y <- "AGE"
nfolds <- 5


# Train & Cross-validate a GBM
my_gbm <- h2o.gbm(x = x, 
                  y = y, 
                  training_frame = train, 
                  distribution = "gaussian",
                  max_depth = 3,
                  learn_rate = 0.2,
                  nfolds = nfolds, 
                  fold_assignment = "Modulo",
                  keep_cross_validation_predictions = TRUE,
                  seed = 1)

# Train & Cross-validate a RF
my_rf <- h2o.randomForest(x = x,
                          y = y, 
                          training_frame = train, 
                          ntrees = 30, 
                          nfolds = nfolds, 
                          fold_assignment = "Modulo",
                          keep_cross_validation_predictions = TRUE,
                          seed = 1)


# Train & Cross-validate a extremely-randomized RF
my_xrf <- h2o.randomForest(x = x,
                           y = y, 
                           training_frame = train, 
                           ntrees = 50,
                           histogram_type = "Random",
                           nfolds = nfolds, 
                           fold_assignment = "Modulo",
                           keep_cross_validation_predictions = TRUE,
                           seed = 1)

# Train a stacked ensemble using the models above
stack <- h2o.stackedEnsemble(x = x, 
                             y = y, 
                             training_frame = train,
                             validation_frame = test,  #also test that validation_frame is working
                             model_id = "my_ensemble_gaussian", 
                             base_models = list(my_gbm@model_id, my_rf@model_id, my_xrf@model_id))

# predict
pred <- h2o.predict(stack, newdata = test)

【讨论】:

    【解决方案2】:

    我的书(Practical Machine Learning with H2O)中的堆叠集成示例是回归(关于建筑能源数据集)。 :-)

    但是,如果您认为自己已经用完 H2O 的所有文档,请尝试在 github 上搜索源代码。这是他们对堆叠集成回归的单元测试:

    https://github.com/h2oai/h2o-3/blob/master/h2o-r/tests/testdir_algos/stackedensemble/runit_stackedensemble_gaussian.R

    【讨论】:

    • 谢谢@Darren Cook。我想它隐藏在里面,不像分类它是开放的。
    猜你喜欢
    • 2016-12-28
    • 2015-01-26
    • 2013-09-26
    • 2012-03-04
    • 2011-12-22
    • 2015-04-28
    • 2016-02-25
    • 1970-01-01
    • 2018-07-15
    相关资源
    最近更新 更多