【发布时间】:2019-08-14 12:57:53
【问题描述】:
我正在尝试调用 GridSearchCV 以获得最佳估算器 如果我这样调用参数
clf = DecisionTreeClassifier(random_state=42)
parameters = {'max_depth':[2,3,4,5,6,7,8,9,10],\
'min_samples_leaf':[2,3,4,5,6,7,8,9,10],\
'min_samples_split':[2,3,4,5,6,7,8,9,10]}
scorer = make_scorer(f1_score)
grid_obj = GridSearchCV(clf, parameters, scoring=scorer)
grid_fit = grid_obj.fit(X_train, y_train)
best_clf = grid_fit.best_estimator_
best_clf.fit(X_train, y_train)
best_train_predictions = best_clf.predict(X_train)
best_test_predictions = best_clf.predict(X_test)
print('The training F1 Score is', f1_score(best_train_predictions, y_train))
print('The testing F1 Score is', f1_score(best_test_predictions,
y_test))
结果是
The training F1 Score is 0.784810126582
The testing F1 Score is 0.72
对于相同的数据,结果会有所不同 我只将 [2,3,4,5,6,7,8,9,10] 更改为 [2,4,6,8,10]
clf = DecisionTreeClassifier(random_state=42)
parameters = {'max_depth':[2,4,6,8,10],'min_samples_leaf':[2,4,6,8,10],\
'min_samples_split':[2,4,6,8,10] }
scorer = make_scorer(f1_score)
grid_obj = GridSearchCV(clf, parameters, scoring=scorer)
grid_fit = grid_obj.fit(X_train, y_train)
best_clf = grid_fit.best_estimator_
best_clf.fit(X_train, y_train)
best_train_predictions = best_clf.predict(X_train)
best_test_predictions = best_clf.predict(X_test)
print('The training F1 Score is', f1_score(best_train_predictions, y_train))
print('The testing F1 Score is', f1_score(best_test_predictions, y_test))
结果
The training F1 Score is 0.814814814815
The testing F1 Score is 0.8
对 GridsearchCV 的工作原理感到困惑
【问题讨论】:
-
在不同的地方可能会发生随机性。例如,在您的训练/测试拆分中可能会产生不同的结果
-
您是否对为什么这两种情况的分数不同感到困惑,或者更确切地说是为什么第一种低于第二种?
-
为什么分数不同
标签: python machine-learning scikit-learn gridsearchcv