如何预测第 n 棵树的 H2O GBM 模型？答案

【问题标题】：How to predict a H2O GBM model for nth tree?如何预测第 n 棵树的 H2O GBM 模型？
【发布时间】：2021-05-26 02:14:01
【问题描述】：

pros_gbm = H2OGradientBoostingEstimator(nfolds=0,seed=1234, keep_cross_validation_predictions = False, ntrees=1000, max_depth=3, learn_rate=0.01, distribution='多项式') pros_gbm.train(x=predictors, y=target, training_frame=hf_train, validation_frame = hf_test)

pros_gbm.predict(hf_test)

目前，我正在像上面那样预测我的测试数据，但是我如何预测这个模型的第 n 棵树（共 1000 棵树）的测试数据？ “预测”中是否有任何选项，或者有其他方法吗？

【问题讨论】：

标签： python h2o interaction multilabel-classification gbm

【解决方案1】：

您可以使用staged_predict_proba() 和来自predict_leaf_node_assignment() 的前导节点分配获得预测概率（每棵树的累积概率）。这是一个例子：

from h2o.estimators import H2OGradientBoostingEstimator

# Import the prostate dataset into H2O:
prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv")

# Set the predictors and response; set the factors:
prostate["CAPSULE"] = prostate["CAPSULE"].asfactor()
predictors = ["ID","AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON"]
response = "CAPSULE"

# Build and train the model:
pros_gbm = H2OGradientBoostingEstimator(nfolds=5,
                                        seed=1111,
                                        keep_cross_validation_predictions = True)
pros_gbm.train(x=predictors, y=response, training_frame=prostate)

print(pros_gbm.predict_leaf_node_assignment(prostate[:1, :]))
print(pros_gbm.staged_predict_proba(prostate[:1, :]))

如果您想了解每棵树的详细信息（叶子/拆分信息），还可以查看 Tree Class。

【讨论】：