LightGBM：继续训练模型答案

【问题标题】：LightGBM: continue training a modelLightGBM：继续训练模型
【发布时间】：2017-08-12 21:14:14
【问题描述】：

我正在使用这样的交叉验证来训练模型：

classifier = lgb.Booster(
    params=params, 
    train_set=lgb_train_set,
)

result = lgb.cv(
    init_model=classifier,
    params=params, 
    train_set=lgb_train_set,
    num_boost_round=1000,
    early_stopping_rounds=20,
    verbose_eval=50,
    shuffle=True
)

我想通过多次运行第二个命令来继续训练模型（可能使用新的训练集或不同的参数），它将继续改进模型。

但是，当我尝试这个时，很明显模型每次都是从头开始的。

有没有不同的方法来做我想做的事？

【问题讨论】：

标签： lightgbm

【解决方案1】：

可以使用 lightgbm.train 的 init_model 选项解决，该选项接受两个对象之一

LightGBM 模型的文件名，或
lightgbm Booster 对象

代码说明：

import numpy as np 
import lightgbm as lgb

data = np.random.rand(1000, 10) # 1000 entities, each contains 10 features
label = np.random.randint(2, size=1000) # binary target
train_data = lgb.Dataset(data, label=label, free_raw_data=False)
params = {}

#Initialize with 10 iterations
gbm_init = lgb.train(params, train_data, num_boost_round = 10)
print("Initial iter# %d" %gbm_init.current_iteration())

# Example of option #1 (pass a file):
gbm_init.save_model('model.txt')
gbm = lgb.train(params, train_data, num_boost_round = 10,
                init_model='model.txt')
print("Option 1 current iter# %d" %gbm.current_iteration())


# Example of option #2 (pass a lightgbm Booster object):
gbm_2 = lgb.train(params, train_data, num_boost_round = 10,
                init_model = gbm_init)
print("Option 2 current iter# %d" %gbm_2.current_iteration())

https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.train.html

【讨论】：

【解决方案2】：

要继续训练，您必须再次执行 lgb.train 并确保包含在参数中 init_model='model.txt'。为确认您已正确完成培训期间的信息反馈，应从lgb.cv 继续。然后像这样bst.save_model('model.txt', num_iteration=bst.best_iteration) 保存模型的最佳迭代。

【讨论】：

【解决方案3】：

init_model 不能单独工作。我们必须为train 方法设置keep_training_booster 参数：

lgb_params = {
  'keep_training_booster': True,
  'objective': 'regression',
  'verbosity': 100,
}
lgb.train(lgb_params, init_model= ...)

或者作为函数参数：

lgb.train(lgb_params, keep_training_booster=True, init_model= ...)

【讨论】：

来自文档keep_training_booster (bool, optional (default=False)) – Whether the returned Booster will be used to keep training. If False, the returned value will be converted into _InnerPredictor before returning. When your model is very large and cause the memory error, you can try to set this param to True to avoid the model conversion performed during the internal call of model_to_string. You can still use _InnerPredictor as init_model for future continue training. 它不像需要设置为True。

【解决方案4】：

似乎 lightgbm 不允许将模型实例作为 init_model 传递，因为它只需要文件名：

init_model（字符串或无，可选（默认=无））- 用于继续训练的 LightGBM 模型或 Booster 实例的文件名。

link

【讨论】：