【问题标题】:The predictions from StackingRegressor (Sklearn) are not reproducibleStackingRegressor (Sklearn) 的预测是不可重现的
【发布时间】:2021-10-20 09:40:36
【问题描述】:

我正在使用 StackingRegressor 训练回归模型,我发现当我使用相同的 random_state 时,该模型的预测并不一致。

这是我的代码:

random_seed = 42
mdl_lgbm = lightgbm.LGBMRegressor(colsample_bytree=0.6,
                                      learning_rate=0.05,
                                      max_depth=6,
                                      min_child_samples=227,
                                      min_child_weight=10,
                                      n_estimators=1800,
                                      num_leaves=45,
                                      reg_alpha=0,
                                      reg_lambda=1,
                                      subsample=0.6,
                                      n_jobs=-1,
                                      random_state=random_seed)



mdl_xgb = xgb.XGBRegressor(subsample=0.5,
                                n_estimators=900,
                                min_child_weight=8,
                                max_depth=6,
                                learning_rate=0.03,
                                colsample_bytree=0.8,
                                n_jobs=-1,
                                reg_alpha=2,
                                reg_lambda=50,
                                objective='reg:squarederror',
                                random_state=random_seed)


mdl_rf = RandomForestRegressor(bootstrap=True,
                                   max_depth=110,
                                   max_features='auto',
                                   min_samples_leaf=5,
                                   min_samples_split=5,
                                   n_estimators=1430,
                                   n_jobs=-1,
                                   random_state=random_seed)


# Base models
base_mdl_names = {
    'XGB': mdl_xgb,
    'LGBM': mdl_lgbm,
    'RF': mdl_rf,
}

        
        
final_estimator = xgb.XGBRegressor(subsample=0.3,
                                   n_estimators=1200,
                                   min_child_weight=2,
                                   max_depth=5,
                                   learning_rate=0.06,
                                   colsample_bytree=0.8,
                                   n_jobs=-1,
                                   reg_alpha=1,
                                   reg_lambda=0.1,
                                   objective='reg:squarederror',
                                   random_state=random_seed)

base_estimators = list()
for name, mdl in base_mdl_names.items():
    base_estimators.append((name, mdl))

stacked_mdl = StackingRegressor(estimators=base_estimators,
                                final_estimator=final_estimator,
                                cv=5,
                                passthrough=True)

stacked_mdl.fit(X_train, y_train)

请注意,我不会更改 X_train。当我使用经过训练的模型进行预测时,结果是不可重现的。我的意思是,如果我重新训练模型,结果会有所不同,而每个输入都是相同的。任何线索为什么会发生这种情况?

【问题讨论】:

    标签: python machine-learning scikit-learn regression


    【解决方案1】:

    StackingRegressor 使用 cv(交叉验证)。因此,您还必须设置其random_state,以便在每次运行时获得完全相同的交叉验证拆分。

    你应该这样做:

    from sklearn.model_selection import KFold
    kfold = KFold(n_splits=5, random_state=random_seed, shuffle=True)
    stacked_mdl = StackingRegressor(estimators=base_estimators,
                                    final_estimator=final_estimator,
                                    cv=kfold,
                                    passthrough=True)
    

    【讨论】:

    • shuffle=False 时设置random_state 将引发错误。也设置shuffle=True
    猜你喜欢
    • 2021-11-15
    • 1970-01-01
    • 2020-09-22
    • 2018-05-06
    • 2016-01-06
    • 2020-01-14
    • 2012-03-28
    • 2016-03-30
    • 1970-01-01
    相关资源
    最近更新 更多