【问题标题】:Hyperparameter tuning for StackingRegressor sklearnStackingRegressor sklearn 的超参数调优
【发布时间】:2021-11-15 01:00:11
【问题描述】:

在我的问题中,我想使用简单的 RandomizedSearchCV 调谐器来调整 sklearn.ensemble.StackingRegressor。由于我们需要在实例化 StackingRegressor() 时定义估算器,因此我无法在 param_distribution 随机搜索中正确定义估算器的参数空间。

我尝试了以下方法,但遇到了错误:

from sklearn.datasets import load_diabetes
from sklearn.linear_model import RidgeCV
from sklearn.svm import LinearSVR
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestRegressor, 
GradientBoostingRegressor
from sklearn.ensemble import StackingRegressor
X, y = load_diabetes(return_X_y=True)

rfr = RandomForestRegressor()
gbr = GradientBoostingRegressor()

estimators = [rfr, gbr]
sreg = StackingRegressor(estimators=estimators)
params = {'rfr__max_depth': [3, 5, 10, 100],
          'gbr__max_depth': [3, 5, 10, 100]}

grid = RandomizedSearchCV(estimator=sreg, 
                          param_distributions=params,
                          cv=3)
grid.fit(X,y)

我遇到了错误AttributeError: 'RandomForestRegressor' object has no attribute 'estimators_'

是否可以在 StackingRegressor 中调整不同估算器的参数?

【问题讨论】:

    标签: python machine-learning scikit-learn regression ensemble-learning


    【解决方案1】:

    如果您将估算器定义为估算器名称和估算器实例的元组列表,如下所示,您的代码应该可以工作。

    import pandas as pd
    from sklearn.datasets import load_diabetes
    from sklearn.model_selection import RandomizedSearchCV
    from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
    from sklearn.ensemble import StackingRegressor
    
    X, y = load_diabetes(return_X_y=True)
    
    rfr = RandomForestRegressor()
    gbr = GradientBoostingRegressor()
    
    estimators = [('rfr', rfr), ('gbr', gbr)]
    
    sreg = StackingRegressor(estimators=estimators)
    
    params = {
        'rfr__max_depth': [3, 5],
        'gbr__max_depth': [3, 5]
    }
    
    grid = RandomizedSearchCV(
        estimator=sreg,
        param_distributions=params,
        n_iter=2,
        cv=3,
        verbose=1,
        random_state=100
    )
    
    grid.fit(X, y)
    
    res = pd.DataFrame(grid.cv_results_)
    print(res)
    #    mean_fit_time  std_fit_time  ...  std_test_score  rank_test_score
    # 0       1.121728      0.024188  ...        0.024546                2
    # 1       1.096936      0.034377  ...        0.013047                1
    

    【讨论】:

      猜你喜欢
      • 2020-05-21
      • 2019-02-01
      • 2021-10-20
      • 2017-08-08
      • 2019-11-03
      • 2017-12-01
      • 2020-09-12
      • 2020-11-17
      • 2020-09-11
      相关资源
      最近更新 更多