为什么 KS 曲线以 (0,0) 开头？答案

【问题标题】：Why KS curve starts with (0,0)?为什么 KS 曲线以 (0,0) 开头？
【发布时间】：2020-04-25 14:11:04
【问题描述】：

KS曲线的纵轴是tpr,fpr和(tpr-fpr)，横轴是阈值。

tpr=(tp/tp+fn).

当threshold = 0时，预测所有样本为1，所以tp = number of positive samples，fn = 0。

因此，tpr=1。

但是我在网上找到的所有 KS 曲线都是以 (0,0) 开头的。不应该是（0,1）吗？我感到很困惑！谢谢回答！

【问题讨论】：

标签： machine-learning statistics data-mining data-analysis

【解决方案1】：

TP：实际为1的正预测数
FP：实际为0的正预测数
TN：实际为0的负预测数
FN：实际为 1 的负预测数

当threshoud = 0时，模型只预测正数，所以FN=TN=0。 FPR = FP/(FP+TN) = 1，TPR=TP/(TP+FN) = 1，所以这个点应该是(1,1)。你犯了一个错误

当threshoud = 1时，模型只预测负数，所以TP = FP = 0. FPR = FP/(FP+TN) = 0, TPR=TP/(TP+FN) = 0，所以这个点应该是(0,0)。

# roc curve and auc
from sklearn.datasets import make_classification
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve
from sklearn.metrics import roc_auc_score
from matplotlib import pyplot
import pandas as pd

# generate 2 class dataset
X, y = make_classification(n_samples=1000, n_classes=2, weights=[1,1], random_state=1)
# split into train/test sets
trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.5, random_state=2)
# fit a model
model = KNeighborsClassifier(n_neighbors=3)
model.fit(trainX, trainy)
# predict probabilities
probs = model.predict_proba(testX)
# keep probabilities for the positive outcome only
probs = probs[:, 1]
# calculate AUC
auc = roc_auc_score(testy, probs)
print('AUC: %.3f' % auc)
# calculate roc curve
fpr, tpr, thresholds = roc_curve(testy, probs)
# plot no skill
pyplot.plot([0, 1], [0, 1], linestyle='--')
# plot the roc curve for the model
pyplot.plot(fpr, tpr, marker='.')
# show the plot
pyplot.show()
# see calculations
pd.DataFrame({'fpr':fpr,'tpr':tpr,'thresholds':thresholds})

输出：

     fpr        tpr         threshouds
0   0.000000    0.000000    2.000000
1   0.054264    0.561983    1.000000
2   0.217054    0.884298    0.666667
3   0.406977    0.975207    0.333333
4   1.000000    1.000000    0.000000

【讨论】：