Pythonic 计算敏感性和特异性的方法答案

【问题标题】：Pythonic way to compute sensitivity and specificityPythonic 计算敏感性和特异性的方法
【发布时间】：2017-01-31 00:06:34
【问题描述】：

我想计算 2 个 numpy 数组（测试、真值）的敏感性和特异性。两个数组具有相同的形状，并且只存储数字 0（测试/真假）、1（测试/真假）。因此，我必须计算 false_positives、true_positives、false_negative 和 true_negative 值。我是这样做的：

true_positive = 0
false_positive = 0
false_negative = 0
true_negative = 0

for y in range(mask.shape[0]):
    for x in range(mask.shape[1]):
        if (mask[y,x] == 255 and truth[y,x] == 255):
            true_positive = true_positive + 1
        elif (mask[y,x] == 255 and truth[y,x] == 0):
            false_positive = false_positive + 1
        elif (mask[y,x] == 0 and truth[y,x] == 255):
            false_negative = false_negative + 1
        elif (mask[y,x] == 0 and truth[y,x] == 0):
            true_negative = true_negative + 1

sensitivity = true_positive / (true_positive + false_negative)
specificity = true_negative / (false_positive + true_negative)

我认为可能存在一种更简单（更具可读性）的方式，因为它是 python 而不是 C++ ...首先我尝试了类似：true_positive = np.sum(mask == 255 and truth == 255) 但我得到了这个错误：

ValueError：具有多个元素的数组的真值不明确。使用 a.any() 或 a.all()

有没有更 Pythonic 的方法来计算敏感性和特异性？

谢谢！

【问题讨论】：

您可能有兴趣检查 scikit-learn sklearn.metrics 它有大量的指标可供选择 scikit-learn.org/stable/modules/…

标签： python numpy

【解决方案1】：

通过 NumPy 支持的 ufunc-vectorized 操作、broadcasting 和 array-slicing 专注于紧凑性，这是一种方法 -

C = (((mask==255)*2 + (truth==255)).reshape(-1,1) == range(4)).sum(0)
sensitivity, specificity = C[3]/C[1::2].sum(), C[0]/C[::2].sum()

或者，稍微NumPythonic，我们可以计算C 和np.bincount -

C = np.bincount(((mask==255)*2 + (truth==255)).ravel())

为了确保我们得到浮动 pt 数字作为比率，在开始时，我们需要使用：from __future__ import division。

【讨论】：

为了完整起见，我完成了我的答案，但这确实是更好的答案......
太棒了！谢谢！看起来好多了:)

【解决方案2】：

测试相同的形状：

a = np.random.rand(4,4)
b = np.random.rand(4,4)
print(a.shape == b.shape) #prints true

测试真值：

#assuming you have scaled a and b to only include 1 or 0 (divide by 255)
true_positive = np.sum(mask * truth)

true_negative = len(mask.flat) - np.count_nonzero(mask + truth)

false_positive = np.count_nonzero(mask - truth == 1)

false_negative = np.count_nonzero(truth - mask == 1)

【讨论】：

【解决方案3】：

这四个数组可以这样查找和组织：

categories=dstack((mask&truth>0,mask>truth,mask<truth,mask|truth==0))

然后是分数：

tp,fp,fn,tn = categories.sum((0,1))

终于出结果了：

sensitivity,specificity = tp/(tp+fn),tn/(tf+fp)

【讨论】：

【解决方案4】：

我的想法是使用标准库中的collections.Counter。

# building pair list (can be shortened to one-liner list comprehension, if you want)
pair_list = []
for y in range(mask.shape[0]):
    for x in range(mask.shape[1]):
        pair_list.append((mask[y, x], truth[y, x]))

# getting Counter object
counter = collections.Counter(pair_list)
true_positive = counter.get((255, 255))
false_positive = counter.get((255, 0))
false_negative = counter.get((0, 255))
true_negative = counter.get((0, 0))

【讨论】：

这看起来比较漂亮，但可能比使用 numpy 慢一个数量级或更慢