如何在多类分类中计算每个类的 F1 度量？答案

【问题标题】：How can I compute F1 measure for each class, in Multiclass Classification?如何在多类分类中计算每个类的 F1 度量？
【发布时间】：2016-04-07 07:32:45
【问题描述】：

我正在使用 SciKit 作为一个库来处理分类算法，例如：NB、SVM。

这是一个非常好的binary classification implementation 用于“SPAM 和 HAM”电子邮件：

    confusion += confusion_matrix(test_y, predictions)
    score = f1_score(test_y, predictions, pos_label=SPAM)
   //note in my case 3-classes I do not need to set [pos_label]

如果我有像 {SPAM, HAM, NORMAL} 这样的三个类而不是两个类，那么：我如何调整该代码以找到每个类以及所有类的 F1-Score 作为平均。

【问题讨论】：

标签： machine-learning nltk computer-science text-classification

【解决方案1】：

这里的问题是 F1 度量 对于多类问题没有真正意义。它是准确率和召回率之间的调和平均值。

精确度是（随机选择的）正分类实例为正的概率。

召回率是（随机选择的）正例被分类为正例的概率。

这些定义本质上是二元的。通常我会分别给出每个类的 F1 度量。这使您还可以决定您可以接受哪些类型的故障。根据我的个人经验，我实际上会给出精确度和召回率。在您的示例中，将火腿电子邮件分类为垃圾邮件将是极其有害的。因此，SpAM 的精确度比召回率更重要。

对于更广泛的概述，还包含一系列措施，您还可以查看http://rali.iro.umontreal.ca/rali/sites/default/files/publis/SokolovaLapalme-JIPM09.pdf

【讨论】：

是的，我怎样才能为每个类实现召回和精度，考虑上面的例子，

【解决方案2】：

使用 sklearn 中的分类报告计算多个类的 F-score。

from sklearn.metrics import classification_report as cr
gold = []
pred = []
# given a test set with annotated gold labels
for testinstance, goldlabel in testdata:
    gold.append(goldlabel)
    #clf is your classifier object with predict method
    predictedlabel = clf.predict(testinstance)
    pred.append(predictedlabel)
print cr(gold,pred, digits=4)

【讨论】：