【发布时间】:2017-11-01 01:43:49
【问题描述】:
我正在尝试使用 h2o 为数据集的不同部分运行 2 种算法(random forest 和 gbm)的优化网格。我的代码看起来像
for (...)
{
read data
# setup h2o cluster
h2o <- h2o.init(ip = "localhost", port = 54321, nthreads = detectCores()-1)
gbm.grid <- h2o.grid("gbm", grid_id = "gbm.grid", x = names(td.train.h2o)[!names(td.train.h2o)%like%segment_binary], y = segment_binary,
seed = 42, distribution = "bernoulli",
training_frame = td.train.h2o, validation_frame = td.train.hyper.h2o,
hyper_params = hyper_params, search_criteria = search_criteria)
# shutdown h2o
h2o.shutdown(prompt = FALSE)
# setup h2o cluster
h2o <- h2o.init(ip = "localhost", port = 54321, nthreads = detectCores()-1)
rf.grid <- h2o.grid("randomForest", grid_id = "rf.grid", x = names(td.train.h2o)[!names(td.train.h2o)%like%segment_binary], y = segment_binary,
seed = 42, distribution = "bernoulli",
training_frame = td.train.h2o, validation_frame = td.train.hyper.h2o,
hyper_params = hyper_params, search_criteria = search_criteria)
h2o.shutdown(prompt = FALSE)
}
问题是如果我一次性运行for loop,我会得到错误
Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = urlSuffix, :
Unexpected CURL error: Failed to connect to localhost port 54321: Connection refused
P.S.:我用的是线路
# shutdown h2o
h2o.shutdown(prompt = FALSE)
# setup h2o cluster
h2o <- h2o.init(ip = "localhost", port = 54321, nthreads = detectCores()-1)
这样我“重置”h2o,这样我就不会耗尽内存
我也读过R H2O - Memory management,但我不清楚它是如何工作的。
更新
在关注 Matteusz 评论后,我在 for loop 之外和 for loop 内部使用 init,我使用 h2o.removeAll()。所以现在我的代码看起来像这样
h2o <- h2o.init(ip = "localhost", port = 54321, nthreads = detectCores()-1)
for(...)
{
read data
gbm.grid <- h2o.grid("gbm", grid_id = "gbm.grid", x = names(td.train.h2o)[!names(td.train.h2o)%like%segment_binary], y = segment_binary,
seed = 42, distribution = "bernoulli",
training_frame = td.train.h2o, validation_frame = td.train.hyper.h2o,
hyper_params = hyper_params, search_criteria = search_criteria)
h2o.removeAll()
rf.grid <- h2o.grid("randomForest", grid_id = "rf.grid", x = names(td.train.h2o)[!names(td.train.h2o)%like%segment_binary], y = segment_binary,
seed = 42, distribution = "bernoulli",
training_frame = td.train.h2o, validation_frame = td.train.hyper.h2o,
hyper_params = hyper_params, search_criteria = search_criteria)
h2o.removeAll() }
它似乎有效,但现在我在grid optimization 中为random forest 收到此错误(?)
任何想法这可能是什么?
【问题讨论】:
标签: r memory memory-management machine-learning h2o