如何使用贝叶斯优化（在 Python 中）在超参数空间上调整超参数？答案

【问题标题】：How to tune hyperparameters over a hyperparameter space using Bayesian Optimization (in Python)?如何使用贝叶斯优化（在 Python 中）在超参数空间上调整超参数？
【发布时间】：2020-10-26 00:43:52
【问题描述】：

我正在尝试使用贝叶斯优化调整超参数，以便使用下面的代码在超参数空间上进行随机森林回归，但我收到一条错误消息

TypeError: init() 得到了一个意外的关键字参数“min_samples”

我在尝试以下代码时遇到此错误：

# Import packages
import numpy as np
from sklearn.metrics import mean_squared_error
from sklearn import datasets
from sklearn.ensemble import RandomForestRegressor
from hyperopt import hp, tpe, fmin, Trials, STATUS_OK

# Create datasets
reg_prob = datasets.make_friedman1(n_samples=100, n_features=10, noise=1.0, random_state=None)
x_train = reg_prob[0][0:50]
y_train = reg_prob[1][0:50]
x_test = reg_prob[0][50:100]
y_test = reg_prob[1][50:100]


#Create Hyperparameter space
space= {'n_estimators':hp.choice('n_estimators', range(2, 150, 1)),
        'min_samples':hp.choice('min_samples', range(2, 100, 1)),
        'max_features':hp.choice('max_features', range(2, 100, 1)),
        'max_samples':hp.choice('max_samples', range(2, 100, 1)),
       }


#Define Objective Function
def objective(space):
    
    rf = RandomForestRegressor(**space)

    
    # fit Training model
    rf.fit(x_train, y_train)
    
    # Making predictions and find RMSE
    y_pred = rf.predict(x_test)
    mse = mean_squared_error(y_test,y_pred)
    rmse = np.sqrt(mse)
    
    
    # Return RMSE
    return rmse


#Surrogate Fn
trials = Trials()
best = fmin(objective,
    space=space,
    algo=tpe.suggest,
    max_evals=100,
    trials=trials)

print(best)
print(trials.results)

我也尝试使用下面的代码在目标函数中列出超参数，但出现以下错误

TypeError：objective() 缺少 3 个必需的位置参数：“min_samples”、“max_features”和“max_samples”

#Define Objective Function
def objective(n_estimators,min_samples,max_features,max_samples):
    
    rf = RandomForestRegressor(n_estimators, min_samples, max_features, max_samples)

    
    # fit Training model
    rf.fit(x_train, y_train)
    
    # Making predictions and find RMSE
    y_pred = rf.predict(x_test)
    mse = mean_squared_error(y_test,y_pred)
    rmse = np.sqrt(mse)
    
    
    # Return RMSE
    return rmse

您能否建议我可以做些什么来修复我的代码？

我能够使用以下代码调整单个超参数：

# Import packages
import numpy as np
import time
from sklearn.metrics import mean_squared_error
from sklearn import datasets
from sklearn.ensemble import RandomForestRegressor

from hyperopt import hp, tpe, fmin, Trials, STATUS_OK
from collections import OrderedDict

reg_prob = datasets.make_friedman1(n_samples=100, n_features=10, noise=1.0, random_state=None)
x_train = reg_prob[0][0:50]
y_train = reg_prob[1][0:50]
x_test = reg_prob[0][50:100]
y_test = reg_prob[1][50:100]

space= hp.choice('num_leaves', range(2, 100, 1))


def objective(num_leaves):
    
    rf = RandomForestRegressor(num_leaves)
    

    rf.fit(x_train, y_train)
    

    y_pred = rf.predict(x_test)
    mse = mean_squared_error(y_test,y_pred)
    rmse = np.sqrt(mse)
    

    
    # Return RMSE
    return rmse

trials = Trials()
best = fmin(objective,
    space=space,
    algo=tpe.suggest,
    max_evals=100,
    trials=trials)

print(best)
print(trials.results)

【问题讨论】：

标签： python scikit-learn hyperparameters hyperopt

【解决方案1】：

问题是RandomForestClassifier中没有名为min_samples的参数。见here。可能你的意思是min_samples_leaf。

只需将min_sample_leaf 的上限保持在数据集中样本数的范围内即可。

否则你的代码没有其他问题。

import matplotlib.pyplot as plt

# Import packages
import numpy as np
from sklearn.metrics import mean_squared_error
from sklearn import datasets
from sklearn.ensemble import RandomForestRegressor
from hyperopt import hp, tpe, fmin, Trials, STATUS_OK

# Create datasets
reg_prob = datasets.make_friedman1(n_samples=100, n_features=10, noise=1.0, random_state=None)
x_train = reg_prob[0][0:50]
y_train = reg_prob[1][0:50]
x_test = reg_prob[0][50:100]
y_test = reg_prob[1][50:100]


#Create Hyperparameter space
space= {'n_estimators':hp.choice('n_estimators', range(2, 150, 1)),
        'min_samples_leaf':hp.choice('min_samples', range(2, 50, 1)),
        'max_features':hp.choice('max_features', range(2, 10, 1)),
        'max_samples':hp.choice('max_samples', range(2, 50, 1)),
       }


#Define Objective Function
def objective(space):
    
    rf = RandomForestRegressor(**space)

    
    # fit Training model
    rf.fit(x_train, y_train)
    
    # Making predictions and find RMSE
    y_pred = rf.predict(x_test)
    mse = mean_squared_error(y_test,y_pred)
    rmse = np.sqrt(mse)
    
    
    # Return RMSE
    return rmse


#Surrogate Fn
trials = Trials()
best = fmin(objective,
    space=space,
    algo=tpe.suggest,
    max_evals=2,
    trials=trials)

print(best)
print(trials.results)

【讨论】：