【问题标题】:Sklearn.metrics.classification_report Confusion Matrix Problem?Sklearn.metrics.classification_report 混淆矩阵问题?
【发布时间】:2020-07-12 16:25:48
【问题描述】:

首先,感谢您阅读我的问题 - 我希望这是正确的地方。

我正在从头开始对混淆矩阵的敏感性、特异性和精度计算进行编码。我有以下 4 个类的混淆矩阵。

                    True Class
                   1   2   3   4

           1   [[  0   1   3   0]
Predicted  2    [  0 181  23   0]
Class      3    [  0  17  53  14]
           4    [  0   3  22  77]]

当我使用 Sklearn.metrics.classification_report 这是我得到的:

precision  recall  f1-score   support

  0.00      0.00      0.00         4
  0.89      0.89      0.89       204
  0.52      0.63      0.57        84
  0.85      0.75      0.80       102

但是,对于精度和召回率,我得到(即精度和召回率的值被翻转):

precision  recall
  0.0       nan
  0.887     0.896
  0.631     0.524
  0.755     0.846

对于每个类,我计算以下真阳性、假阳性、真阴性和假阴性:

class Tp  Fp  Tn   Fn
1     0   4   390  0
2     181 23  169  21
3     53  31  262  48
4     77  25  278  14

我使用的公式 (https://en.wikipedia.org/wiki/Confusion_matrix) 是:

sensitivity/recall = true_positives / (true_positives + false_negatives)

precision = true_positives/(true_positives+false_positives)

我哪里错了,肯定不是sklearn的分类问题,我是不是看错了什么?

编辑:我的函数用于计算精度和召回值,给定来自 sklearn.metrics.confusion_matrix 的混淆矩阵和类号列表,例如类 1-3:[1,2,3] 类。

def calc_precision_recall(conf_matrix, class_labels):

    # for each class 
    for i in range(len(class_labels)):

        # calculate true positives
        true_positives =(conf_matrix[i, i])

        # false positives
        false_positives = (conf_matrix[i, :].sum() - true_positives)

        # false negatives
        false_negatives = 0
        for j in range(len(class_labels)):
            false_negatives += conf_matrix[j, i]
        false_negatives -= true_positives

        # and finally true negatives
        true_negatives = (conf_matrix.sum() - false_positives - false_negatives - true_positives)

        # print calculated values
        print(
            "Class label", class_labels[i],
            "T_positive", true_positives,
            "F_positive", false_positives,
            "T_negative", true_negatives,
            "F_negative", false_negatives,
            "\nSensitivity/recall", true_positives / (true_positives + false_negatives),
            "Specificity", true_negatives / (true_negatives + false_positives),
            "Precision", true_positives/(true_positives+false_positives), "\n"
        )

    return

【问题讨论】:

    标签: python machine-learning scikit-learn confusion-matrix


    【解决方案1】:

    好的,你的代码在哪里?当没有人可以看到您的代码时,无法确定。我会在这里尝试一下……也许您的数据不平衡。您在某些特征列中是否有更多/更少的记录?以一致的方式重新采样数组或稀疏矩阵。

    这对你来说应该很好,对吧。测试一下看看。

    # Begin by importing all necessary libraries
    import pandas as pd
    from sklearn.metrics import classification_report
    from sklearn.metrics import confusion_matrix
    from sklearn.metrics import accuracy_score
    from sklearn.neighbors import KNeighborsClassifier
    from sklearn.svm import SVC
    from sklearn import datasets
    
    
    # import some data to play with
    iris = datasets.load_iris()
    X = iris.data[:, 0:3]  # we only take the first two features.
    y = iris.target
    
    
    # Now that we have the features and labels we want, we can split the data into training and testing sets using sklearn's handy feature train_test_split():
    
    # Test size specifies how much of the data you want to set aside for the testing set. 
    # Random_state parameter is just a random seed we can use.
    # You can use it if you'd like to reproduce these specific results.
    from sklearn.model_selection import train_test_split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=27)
    
    # You may want to print the results to be sure your data is being parsed as you expect:
    
    print(X_train)  
    print(y_train)
    
    # Now we can instantiate the models. Let's try using two classifiers, a Support Vector Classifier and a K-Nearest Neighbors Classifier:
    
    SVC_model = SVC()
    # KNN model requires you to specify n_neighbors,
    # the number of points the classifier will look at to determine what class a new point belongs to
    KNN_model = KNeighborsClassifier(n_neighbors=5)
    
    # Now let's fit the classifiers:
    
    SVC_model.fit(X_train, y_train)
    KNN_model.fit(X_train, y_train)
    
    # The call has trained the model, so now we can predict and store the prediction in a variable:
    
    SVC_prediction = SVC_model.predict(X_test)
    KNN_prediction = KNN_model.predict(X_test)
    
    #We should now evaluate how the classifier performed. There are multiple methods of evaluating a classifier's performance, and you can read more about there different methods below.
    
    #In Scikit-Learn you just pass in the predictions against the ground truth labels which were stored in your test labels:
    
    # Accuracy score is the simplest way to evaluate
    print(accuracy_score(SVC_prediction, y_test))
    print(accuracy_score(KNN_prediction, y_test))
    
    
    
    # But Confusion Matrix and Classification Report give more details about performance
    print(confusion_matrix(SVC_prediction, y_test))
    print(classification_report(KNN_prediction, y_test))
    

    结果:

                  precision    recall  f1-score   support
    
               0       1.00      1.00      1.00         7
               1       0.91      0.91      0.91        11
               2       0.92      0.92      0.92        12
    
        accuracy                           0.93        30
       macro avg       0.94      0.94      0.94        30
    weighted avg       0.93      0.93      0.93        30
    

    请参阅下面的资源。

    https://www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced-datasets

    https://scikit-learn.org/stable/modules/generated/sklearn.utils.resample.html

    哦,X 和 y 变量都有 150 条记录。

    X.shape
    y.shape
    

    结果:

    X.shape
    Out[107]: (150, 3)
    
    y.shape
    Out[108]: (150,)
    

    【讨论】:

    • 您好,非常感谢您的友好回答!这确实对我有用,但报告的精度和召回值仍然与我计算的不匹配。您的示例不是最好的,因为精度和召回值完全相同。但是,我将最近邻数减少到 2 以使类具有不同的精度和召回值,然后使用我在上面编写并包含的函数来计算精度和召回值 - 它们保持翻转。
    【解决方案2】:

    我比较了每个命令的回报和手工制作的回报,他们都同意。我想您错误地考虑了 TP、FN、FP、TN 的值(或某些值)。查看图表可能会有所帮助:

    (图片来源于网络:https://www.stardat.net/post/confusion-matrix

    【讨论】:

      猜你喜欢
      • 2020-07-08
      • 2019-03-28
      • 2015-12-17
      • 2020-10-01
      • 2012-01-20
      • 2019-11-23
      • 1970-01-01
      • 1970-01-01
      • 2022-07-07
      相关资源
      最近更新 更多