【发布时间】:2019-12-09 22:29:52
【问题描述】:
我需要开发一个没有(或接近于)假阴性值的模型。为此,我绘制了 Recall-Precision 曲线并确定阈值应设置为 0.11
我的问题是,如何在模型训练时定义阈值?稍后在评估时定义它是没有意义的,因为它不会反映新数据。
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)
rfc_model = RandomForestClassifier(random_state=101)
rfc_model.fit(X_train, y_train)
rfc_preds = rfc_model.predict(X_test)
recall_precision_vals = []
for val in np.linspace(0, 1, 101):
predicted_proba = rfc_model.predict_proba(X_test)
predicted = (predicted_proba[:, 1] >= val).astype('int')
recall_sc = recall_score(y_test, predicted)
precis_sc = precision_score(y_test, predicted)
recall_precision_vals.append({
'Threshold': val,
'Recall val': recall_sc,
'Precis val': precis_sc
})
recall_prec_df = pd.DataFrame(recall_precision_vals)
有什么想法吗?
【问题讨论】:
标签: python machine-learning scikit-learn random-forest