超参数调整 (Keras) 一种神经网络回归答案

【问题标题】：Hyperparameter Tuning (Keras) a Neural Network Regression超参数调整 (Keras) 一种神经网络回归
【发布时间】：2022-01-23 17:16:14
【问题描述】：

我们在 Python 中开发了一个人工神经网络，在这方面，我们希望使用 GridSearchCV 调整超参数以找到可能的最佳超参数。我们的人工神经网络的目标是根据其他相关特征来预测温度，到目前为止，这是对神经网络性能的评估：

Coefficient of Determination (R2)    Root Mean Square Error (RMSE)    Mean Squared Error (MSE)    Mean Absolute Percent Error (MAPE)    Mean Absolute Error (MAE)    Mean Bias Error (MBE)
0.9808840288506496                   0.7527763482280911               0.5666722304516204          0.09142692180578049                   0.588041786518511           -0.07293321963266877

到目前为止，我们不知道如何正确使用 GridSearchCV，因此我们寻求帮助以使我们朝着满足我们目标的解决方案前进。我们有一个功能可能有效，但无法将其正确应用到我们的代码中。

这是超参数调优函数（GridSearchCV）：

def hyperparameterTuning():
    # Listing all the parameters to try
    Parameter_Trials = {'batch_size': [10, 20, 30],
                    'epochs': [10, 20],
                    'Optimizer_trial': ['adam', 'rmsprop']
                    }

    # Creating the regression ANN model
    RegModel = KerasRegressor(make_regression_ann, verbose=0)

    # Creating the Grid search space
    grid_search = GridSearchCV(estimator=RegModel,
                           param_grid=Parameter_Trials,
                           scoring=None,
                           cv=5)

    # Running Grid Search for different paramenters
    grid_search.fit(X, y, verbose=1)

    print('### Printing Best parameters ###')
    grid_search.best_params_

我们的主要功能：

if __name__ == '__main__':

    print('--------------')

    dataframe = pd.read_csv("/.../file.csv")
    
    # Splitting data into training and tesing data
    X_train, X_test, y_train, y_test, PredictorScalerFit, TargetVarScalerFit = splitData(dataframe=dataframe)
    
    # Making the Regression Artificial Neural Network (ANN)
    ann = ANN(X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test, PredictorScalerFit=PredictorScalerFit, TargetVarScalerFit=TargetVarScalerFit)

    # Evaluation of the performance of the Aritifical Neural Network (ANN)
    eval = evaluation(y_test_orig=ann['temp'], y_test_pred=ann['Predicted_temp'])

我们将数据拆分为训练和测试数据的功能：

def splitData(dataframe):

    X = dataframe[Predictors].values
    y = dataframe[TargetVariable].values

    ### Sandardization of data ###
    PredictorScaler = StandardScaler()
    TargetVarScaler = StandardScaler()

    # Storing the fit object for later reference
    PredictorScalerFit = PredictorScaler.fit(X)
    TargetVarScalerFit = TargetVarScaler.fit(y)

    # Generating the standardized values of X and y
    X = PredictorScalerFit.transform(X)
    y = TargetVarScalerFit.transform(y)

    # Split the data into training and testing set
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

    return X_train, X_test, y_train, y_test, PredictorScalerFit, TargetVarScalerFit

我们拟合模型并利用人工神经网络 (ANN) 的功能

def ANN(X_train, y_train, X_test, y_test, TargetVarScalerFit, PredictorScalerFit):

    model = make_regression_ann()

    # Fitting the ANN to the Training set
    model.fit(X_train, y_train, batch_size=5, epochs=100, verbose=1)

    # Generating Predictions on testing data
    Predictions = model.predict(X_test)

    # Scaling the predicted temp data back to original price scale
    Predictions = TargetVarScalerFit.inverse_transform(Predictions)

    # Scaling the y_test temp data back to original temp scale
    y_test_orig = TargetVarScalerFit.inverse_transform(y_test)

    # Scaling the test data back to original scale
    Test_Data = PredictorScalerFit.inverse_transform(X_test)

    TestingData = pd.DataFrame(data=Test_Data, columns=Predictors)
    TestingData['temp'] = y_test_orig
    TestingData['Predicted_temp'] = Predictions
    TestingData.head()

    # Computing the absolute percent error
    APE = 100 * (abs(TestingData['temp'] - TestingData['Predicted_temp']) / TestingData['temp'])
    TestingData['APE'] = APE

    # ...
    TestingData = TestingData.round(2)

    TestingData.to_csv("TestingData.csv")

    return TestingData

我们制作人工神经网络模型的功能

def make_regression_ann():
    # create ANN model
    model = Sequential()

    # Defining the Input layer and FIRST hidden layer, both are same!
    model.add(Dense(units=8, input_dim=7, kernel_initializer='normal', activation='sigmoid'))

    # Defining the Second layer of the model
    # after the first layer we don't have to specify input_dim as keras configure it automatically
    model.add(Dense(units=6, kernel_initializer='normal', activation='sigmoid'))

    # The output neuron is a single fully connected node
    # Since we will be predicting a single number
    model.add(Dense(1, kernel_initializer='normal'))

    # Compiling the model
    model.compile(loss='mean_squared_error', optimizer='adam')

    return model

我们评估人工神经网络性能的函数

def evaluation(y_test_orig, y_test_pred):

    # Computing the Mean Absolute Percent Error
    MAPE = mean_absolute_percentage_error(y_test_orig, y_test_pred)

    # Computing R2 Score
    r2 = r2_score(y_test_orig, y_test_pred)

    # Computing Mean Square Error (MSE)
    MSE = mean_squared_error(y_test_orig, y_test_pred)

    # Computing Root Mean Square Error (RMSE)
    RMSE = mean_squared_error(y_test_orig, y_test_pred, squared=False)

    # Computing Mean Absolute Error (MAE)
    MAE = mean_absolute_error(y_test_orig, y_test_pred)

    # Computing Mean Bias Error (MBE)
    MBE = np.mean(y_test_pred - y_test_orig)  # here we calculate MBE

    print('--------------')

    print('The Coefficient of Determination (R2) of ANN model is:', r2)
    print("The Root Mean Squared Error (RMSE) of ANN model is:", RMSE)
    print("The Mean Squared Error (MSE) of ANN model is:", MSE)
    print('The Mean Absolute Percent Error (MAPE) of ANN model is:', MAPE)
    print("The Mean Absolute Error (MAE) of ANN model is:", MAE)
    print("The Mean Bias Error (MBE) of ANN model is:", MBE)

    print('--------------')

    eval_list = [r2, RMSE, MSE, MAPE, MAE, MBE]
columns = ['Coefficient of Determination (R2)', 'Root Mean Square Error (RMSE)', 'Mean Squared Error (MSE)',
           'Mean Absolute Percent Error (MAPE)', 'Mean Absolute Error (MAE)', 'Mean Bias Error (MBE)']

    dataframe = pd.DataFrame([eval_list], columns=columns)

    return dataframe

【问题讨论】：

你看到了什么错误？

标签： python tensorflow keras scikit-learn hyperparameters

【解决方案1】：

如果您更新 make_regression_ann 函数以包含您想要优化的任何超参数作为输入，您的代码应该可以工作，但拟合参数除外。

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import make_regression

def make_regression_ann(initializer='uniform', activation='relu', optimizer='adam', loss='mse'):

    model = Sequential()
    model.add(Dense(units=8, input_dim=7, kernel_initializer=initializer, activation=activation))
    model.add(Dense(units=6, kernel_initializer=initializer, activation=activation))
    model.add(Dense(1, kernel_initializer=initializer))
    model.compile(loss=loss, optimizer=optimizer)

    return model

param_grid = {
    'initializer': ['normal', 'uniform'],
    'activation': ['relu', 'sigmoid'],
    'optimizer': ['adam', 'rmsprop'],
    'loss': ['mse', 'mae'],
    'batch_size': [32, 64],
    'epochs': [5, 10],
}

grid_search = GridSearchCV(
    estimator=KerasRegressor(make_regression_ann, verbose=0),
    param_grid=param_grid,
    scoring='neg_mean_absolute_percentage_error',
    cv=3,
)

X, y = make_regression(n_features=7, n_samples=100, random_state=42)

grid_search.fit(X, y, verbose=1)

grid_search.best_params_
# {'activation': 'sigmoid',
#  'batch_size': 32,
#  'epochs': 10,
#  'initializer': 'normal',
#  'loss': 'mae',
#  'optimizer': 'adam'}

【讨论】：

在def make_regression_ann(initializer='uniform', activation='relu', optimizer='adam', loss='mse'): 中添加参数使其工作，但是为什么uniform 和relu，而不是normal 和sigmoid？似乎是显而易见的选择
另外，您介意解释一下grid_search.best_params_ 的损失吗？当'loss':'mae' 到底是什么意思时，我们找不到任何关于它的含义的读物。非常感谢您的宝贵时间！ :-)
这只是一个示例，如果您愿意，可以为默认参数使用不同的值。损失是训练模型时最小化的目标函数。 'mae' 是平均绝对误差的缩写。你可以在documentation找到所有keras loss的列表。
我们不完全了解的是loss 中的比较。为什么选择 mae 而不是 mse？是因为mae比mse更接近0吗？例如，我们知道其他值是输出，因为为神经网络选择这些参数会导致更好的预测，但是在这种情况下如何理解'loss':'mae'？

【解决方案2】：

我最近成功使用GridSearchCV的方式是：

tuned_parameters2 = {'C': [1,10,100,10000], 'max_iter':[5000,10000,50000]}
model2 = GridSearchCV(svm.LinearSVC(), tuned_parameters2)
model2.fit(features, y_train)

因此，使用超参数分离字典，然后将您的模型分配给 GridSearchCV(make_regression_ann, the_hyperparam_dict)。然后用数据拟合它。

在您的情况下，这种方法需要更多的重构。由您决定将 ANN 提供给 GridSearchCV 是否更好。

【讨论】：