【发布时间】:2021-09-24 01:29:37
【问题描述】:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.5, random_state=2)
# generate a no skill prediction (majority class)
ns_probs = [0 for _ in range(len(y_test))]
# fit a model
model = KNeighborsClassifier(n_neighbors = 3)
model.fit(x_train, y_train)
# predict probabilities
lr_probs = model.predict_proba(x_test)
# keep probabilities for the positive outcome only
lr_probs = lr_probs[:, 1]
# calculate scores
ns_auc = roc_auc_score(y_test, ns_probs)
lr_auc = roc_auc_score(y_test, lr_probs)
# summarize scores
print('No Skill: ROC AUC=%.3f' % (ns_auc))
print('Logistic: ROC AUC=%.3f' % (lr_auc))
# calculate roc curves
ns_fpr, ns_tpr, _ = roc_curve(y_test, ns_probs) <-- Error Occurred
lr_fpr, lr_tpr, _ = roc_curve(y_test, lr_probs)
...
我正在尝试在 KNN 算法中使用 ROC 曲线。
ValueError: y_true takes value in {'True', 'False'} and pos_label is not specified:
either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly
但是,正如您在上面看到的,发生了错误。
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
encoder.fit(data.Malware)
data['TrueorFalse'] = encoder.transform(data['TrueorFalse'])
data.value_counts(data['TrueorFalse'].values, sort=False)
data.head()
所以为了解决这个问题,我认为我写的“True”和“False”标签是有问题的,因为它们是字符串。因此,应用上面的代码将 True 或 Flase 分别切换为 0 和 1,但仍然会出现错误。我使用True 和False 作为TrueorFalse 列中的标签。我有什么遗漏吗?
【问题讨论】:
标签: machine-learning scikit-learn computer-vision sklearn-pandas