【发布时间】:2021-06-13 03:29:08
【问题描述】:
所以我使用balanced_accuracy作为我的评分运行了以下随机森林网格搜索:
# define the parameter grid
param_grid = [
{'criterion': ['gini', 'entropy'], # try different purity metrics in building the trees
'max_depth': [2, 5, 8, 10, 15, 20], # vary the max_depth of the trees in the ensemble
'n_estimators': [10, 50, 100, 200], # vary the number of trees in the ensemble
'max_samples': [0.4, 0.7, 0.9]} # vary how many samples each tree is built with
]
# setup the Random Forest model with all arguments as default
model = RandomForestClassifier()
# pass the model and the param_grid to the grid search, and use 5 folds with 'accuracy' as the scoring measure
grid_search = GridSearchCV(model, param_grid, cv = 5, scoring = 'balanced_accuracy')
# fit the grid search to the training set
grid_search.fit(X_smote, y_smote)
# return best model
rf_best = grid_search.best_estimator_
# return the hyperparameter values of the best model
print(grid_search.best_params_)
# use the best model to make predictions on the test set
y_pred = rf_best.predict(X_test)
# compute the test set accuracy of the best model
print("accuracy: ", accuracy_score(y_test,y_pred))
print("f1: ", f1_score(y_test, y_pred, pos_label='Listed'))
print("precision: ", precision_score(y_test, y_pred, pos_label='Listed'))
print("recall: ", recall_score(y_test, y_pred, pos_label='Listed'))
这会产生以下分数:
{'criterion': 'gini', 'max_depth': 20, 'max_samples': 0.7, 'n_estimators': 100}
accuracy: 0.6547231270358306
f1: 0.7612612612612613
precision: 0.9260273972602739
recall: 0.6462715105162524
我想使用 average_precision 评分参数,因为这更适合我的用例,因此我将语法更新为以下内容:
from sklearn.metrics import average_precision_score
# define the parameter grid
param_grid = [
{'criterion': ['gini', 'entropy'], # try different purity metrics in building the trees
'max_depth': [2, 5, 8, 10, 15, 20], # vary the max_depth of the trees in the ensemble
'n_estimators': [10, 50, 100, 200], # vary the number of trees in the ensemble
'max_samples': [0.4, 0.7, 0.9]} # vary how many samples each tree is built with
]
# setup the Random Forest model with all arguments as default
model = RandomForestClassifier()
# pass the model and the param_grid to the grid search, and use 5 folds with 'accuracy' as the scoring measure
grid_search = GridSearchCV(model, param_grid, cv = 5, scoring = 'average_precision')
# fit the grid search to the training set
grid_search.fit(X_smote, y_smote)
# return best model
rf_best = grid_search.best_estimator_
# return the hyperparameter values of the best model
print(grid_search.best_params_)
# use the best model to make predictions on the test set
y_pred = rf_best.predict(X_test)
# compute the test set accuracy of the best model
print("accuracy: ", accuracy_score(y_test,y_pred))
print("f1: ", f1_score(y_test, y_pred, pos_label='Listed'))
print("precision: ", precision_score(y_test, y_pred, pos_label='Listed'))
print("recall: ", recall_score(y_test, y_pred, pos_label='Listed'))
但是我收到以下错误:
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\metrics\_ranking.py in average_precision_score(y_true, y_score, average, pos_label, sample_weight)
211 if len(present_labels) == 2 and pos_label not in present_labels:
212 raise ValueError("pos_label=%r is invalid. Set it to a label in "
--> 213 "y_true." % pos_label)
214 average_precision = partial(_binary_uninterpolated_average_precision,
215 pos_label=pos_label)
ValueError: pos_label=1 is invalid. Set it to a label in y_true.
为什么我不能像使用balanced_accuracy 那样在我的代码中使用average_precision。有什么我应该做的不同的事情吗?
【问题讨论】:
-
代码的第二部分有错字吗?你应该使用
average_precision_score而不是precision_score -
@StupidWolf,我先尝试过,但收到以下错误消息:'ValueError:'average_precision_score'不是有效的评分值。使用 sorted(sklearn.metrics.SCORERS.keys()) 获取有效选项。'
标签: python scikit-learn classification