如何读取此 ROC 曲线并设置自定义阈值？答案

【问题标题】：How to read this ROC curve and set custom thresholds?如何读取此 ROC 曲线并设置自定义阈值？
【发布时间】：2019-03-10 20:46:27
【问题描述】：

使用此代码：

from sklearn import metrics
import numpy as np
import matplotlib.pyplot as plt

y_true = [1,0,0]
y_predict = [.6,.1,.1]

fpr, tpr, thresholds = metrics.roc_curve(y_true, y_predict , pos_label=1)

print(fpr)
print(tpr)
print(thresholds)

# Print ROC curve
plt.plot(fpr,tpr)
plt.show()


y_true = [1,0,0]
y_predict = [.6,.1,.6]

fpr, tpr, thresholds = metrics.roc_curve(y_true, y_predict , pos_label=1)

print(fpr)
print(tpr)
print(thresholds)

# Print ROC curve
plt.plot(fpr,tpr)
plt.show()

绘制了以下 roc 曲线：

scikit learn 设置了阈值，但我想设置自定义阈值。

例如，对于值：

y_true = [1,0,0]
y_predict = [.6,.1,.6]

返回以下阈值：

[1.6 0.6 0.1]

为什么 ROC 曲线中不存在值 1.6？在这种情况下，阈值 1.6 是否多余，因为概率范围为 0-1 ？是否可以设置自定义阈值：.3,.5,.7 以检查分类器在这种情况下的表现如何？

更新：

来自https://sachinkalsi.github.io/blog/category/ml/2018/08/20/top-8-performance-metrics-one-should-know.html#receiver-operating-characteristic-curve-roc 我使用了相同的 x 和预测值：

from sklearn import metrics
import numpy as np
import matplotlib.pyplot as plt

y_true = [1,1,1,0]
y_predict = [.94,.87,.83,.80]

fpr, tpr, thresholds = metrics.roc_curve(y_true, y_predict , pos_label=1)

print('false positive rate:', fpr)
print('true positive rate:', tpr)
print('thresholds:', thresholds)

# Print ROC curve
plt.plot(fpr,tpr)
plt.show()

产生这个情节：

情节与博客中引用的情节不同，阈值也不同：

此外，使用 scikit metrics.roc_curve 实现返回的阈值是：thresholds: [0.94 0.83 0.8 ]。 scikit 是否应该返回与使用相同点相似的 roc 曲线？我应该自己实现 roc 曲线，而不是依赖 scikit 实现，因为结果不同？

【问题讨论】：

标签： python machine-learning data-science roc

【解决方案1】：

阈值不会出现在 ROC 曲线中。 scikit-learn 文档说：

thresholds[0] 表示没有实例被预测，任意设置为 max(y_score) + 1

如果y_predict 包含0.3, 0.5, 0.7，那么metrics.roc_curve 函数将尝试这些阈值。

通常在计算ROC 曲线时遵循这些步骤

1.以降序对y_predict 进行排序。

2。对于y_predict 中的每个概率分数（假设为τ_i），如果y_predict >= τ_i，则认为该数据点为正数。

P.S：如果我们有 N 个数据点，那么我们将有 N 个阈值（如果 y_true 和 y_predict 的组合是唯一的）

3.对于每个y_predicted (τ_i) 值，计算 TPR 和 FPR。

4.通过采用N（数据点数量）TPR、FPR 对来绘制 ROC

详细信息可以参考this blog

【讨论】：

您绘制的图表是正确的。我绘制了典型的 ROC。在那个情节旁边，我添加了一个P.S: The example (TPR,FPR) pairs have not been plotted in the above graph。无论metrics.roc_curve 函数返回的阈值是正确的。你可以这样做。我认为没有必要实现自己的自定义roc_curve 函数