【问题标题】:sklearn plot confusion matrix with labelssklearn 绘制带有标签的混淆矩阵
【发布时间】:2013-10-14 13:17:20
【问题描述】:

我想绘制一个混淆矩阵来可视化分类器的性能,但它只显示标签的数量,而不是标签本身:

from sklearn.metrics import confusion_matrix
import pylab as pl
y_test=['business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business']

pred=array(['health', 'business', 'business', 'business', 'business',
       'business', 'health', 'health', 'business', 'business', 'business',
       'business', 'business', 'business', 'business', 'business',
       'health', 'health', 'business', 'health'], 
      dtype='|S8')

cm = confusion_matrix(y_test, pred)
pl.matshow(cm)
pl.title('Confusion matrix of the classifier')
pl.colorbar()
pl.show()

如何将标签(健康、业务等)添加到混淆矩阵中?

【问题讨论】:

    标签: python matplotlib scikit-learn


    【解决方案1】:

    正如this question 中所暗示的,您必须“打开”lower-level artist API,方法是存储您调用的 matplotlib 函数传递的图形和轴对象(下面的 figaxcax 变量)。然后,您可以使用 set_xticklabels/set_yticklabels 替换默认的 x 轴和 y 轴刻度:

    from sklearn.metrics import confusion_matrix
    
    labels = ['business', 'health']
    cm = confusion_matrix(y_test, pred, labels)
    print(cm)
    fig = plt.figure()
    ax = fig.add_subplot(111)
    cax = ax.matshow(cm)
    plt.title('Confusion matrix of the classifier')
    fig.colorbar(cax)
    ax.set_xticklabels([''] + labels)
    ax.set_yticklabels([''] + labels)
    plt.xlabel('Predicted')
    plt.ylabel('True')
    plt.show()
    

    请注意,我将labels 列表传递给confusion_matrix 函数以确保它正确排序,匹配刻度。

    结果如下图:

    【讨论】:

    • 如果您有多个类别,Matplotlib 决定错误地标记轴 - 您必须强制它标记每个单元格。 from matplotlib.ticker import MultipleLocator; ax.xaxis.set_major_locator(MultipleLocator(1)); ax.yaxis.set_major_locator(MultipleLocator(1))
    • 作为一个新人,你能告诉我3个盒子的大小是否暗示了准确度吗?
    • 如何在上面显示数字?因为在所有情况下颜色可能无法传达太多
    • 嗨...@metakermit。你能告诉我如何在彩色图中显示数字吗?
    【解决方案2】:

    您可能对以下内容感兴趣 https://github.com/pandas-ml/pandas-ml/

    它实现了混淆矩阵的 Python Pandas 实现。

    一些功能:

    • 绘制混淆矩阵
    • 绘制归一化混淆矩阵
    • 类统计
    • 总体统计数据

    这是一个例子:

    In [1]: from pandas_ml import ConfusionMatrix
    In [2]: import matplotlib.pyplot as plt
    
    In [3]: y_test = ['business', 'business', 'business', 'business', 'business',
            'business', 'business', 'business', 'business', 'business',
            'business', 'business', 'business', 'business', 'business',
            'business', 'business', 'business', 'business', 'business']
    
    In [4]: y_pred = ['health', 'business', 'business', 'business', 'business',
           'business', 'health', 'health', 'business', 'business', 'business',
           'business', 'business', 'business', 'business', 'business',
           'health', 'health', 'business', 'health']
    
    In [5]: cm = ConfusionMatrix(y_test, y_pred)
    
    In [6]: cm
    Out[6]:
    Predicted  business  health  __all__
    Actual
    business         14       6       20
    health            0       0        0
    __all__          14       6       20
    
    In [7]: cm.plot()
    Out[7]: <matplotlib.axes._subplots.AxesSubplot at 0x1093cf9b0>
    
    In [8]: plt.show()
    

    In [9]: cm.print_stats()
    Confusion Matrix:
    
    Predicted  business  health  __all__
    Actual
    business         14       6       20
    health            0       0        0
    __all__          14       6       20
    
    
    Overall Statistics:
    
    Accuracy: 0.7
    95% CI: (0.45721081772371086, 0.88106840959427235)
    No Information Rate: ToDo
    P-Value [Acc > NIR]: 0.608009812201
    Kappa: 0.0
    Mcnemar's Test P-Value: ToDo
    
    
    Class Statistics:
    
    Classes                                 business health
    Population                                    20     20
    P: Condition positive                         20      0
    N: Condition negative                          0     20
    Test outcome positive                         14      6
    Test outcome negative                          6     14
    TP: True Positive                             14      0
    TN: True Negative                              0     14
    FP: False Positive                             0      6
    FN: False Negative                             6      0
    TPR: (Sensitivity, hit rate, recall)         0.7    NaN
    TNR=SPC: (Specificity)                       NaN    0.7
    PPV: Pos Pred Value (Precision)                1      0
    NPV: Neg Pred Value                            0      1
    FPR: False-out                               NaN    0.3
    FDR: False Discovery Rate                      0      1
    FNR: Miss Rate                               0.3    NaN
    ACC: Accuracy                                0.7    0.7
    F1 score                               0.8235294      0
    MCC: Matthews correlation coefficient        NaN    NaN
    Informedness                                 NaN    NaN
    Markedness                                     0      0
    Prevalence                                     1      0
    LR+: Positive likelihood ratio               NaN    NaN
    LR-: Negative likelihood ratio               NaN    NaN
    DOR: Diagnostic odds ratio                   NaN    NaN
    FOR: False omission rate                       1      0
    

    【讨论】:

    • 什么,你是怎么让它工作的?使用最新的 pandas_ml 它给了我一个空白的混淆矩阵(全 0),标签是真/假而不是商业和健康。
    • 同理,是空白
    • 我收到 AttributeError: module 'sklearn.metrics' has no attribute 'jaccard_similarity_score' with scikit-learn version 0.23.1 和 pandas-ml version 0.6.1。我也尝试过其他版本,但没有运气。
    • 如何将 x 轴标签对角线?
    【解决方案3】:

    更新:

    在 scikit-learn 0.22 中,有一个新功能可以直接绘制混淆矩阵。

    查看文档:sklearn.metrics.plot_confusion_matrix


    旧答案:

    我认为这里值得一提seaborn.heatmap的使用。

    import seaborn as sns
    import matplotlib.pyplot as plt     
    
    ax= plt.subplot()
    sns.heatmap(cm, annot=True, fmt='g', ax=ax);  #annot=True to annotate cells, ftm='g' to disable scientific notation
    
    # labels, title and ticks
    ax.set_xlabel('Predicted labels');ax.set_ylabel('True labels'); 
    ax.set_title('Confusion Matrix'); 
    ax.xaxis.set_ticklabels(['business', 'health']); ax.yaxis.set_ticklabels(['health', 'business']);
    

    【讨论】:

    • 建议:将fmt='g' 传递给heatmap 调用,以防止数字变为科学记数法。
    • 建议:将cmap='Greens' 传递给heatmap 调用以获得直观的颜色含义。
    • 如何确保没有混淆标签?
    • @RevolucionforMonica 当您获得confusion_matrix 时,X 轴刻度标签为 1、0,Y 轴刻度标签为 0、1(按轴值递增的顺序)。如果分类器是clf,则可以通过clf.classes_ 获得类顺序,在这种情况下应该匹配["health", "business"]。 (假设business是正类)。
    【解决方案4】:

    我找到了一个可以绘制从sklearn 生成的混淆矩阵的函数。

    import numpy as np
    
    
    def plot_confusion_matrix(cm,
                              target_names,
                              title='Confusion matrix',
                              cmap=None,
                              normalize=True):
        """
        given a sklearn confusion matrix (cm), make a nice plot
    
        Arguments
        ---------
        cm:           confusion matrix from sklearn.metrics.confusion_matrix
    
        target_names: given classification classes such as [0, 1, 2]
                      the class names, for example: ['high', 'medium', 'low']
    
        title:        the text to display at the top of the matrix
    
        cmap:         the gradient of the values displayed from matplotlib.pyplot.cm
                      see http://matplotlib.org/examples/color/colormaps_reference.html
                      plt.get_cmap('jet') or plt.cm.Blues
    
        normalize:    If False, plot the raw numbers
                      If True, plot the proportions
    
        Usage
        -----
        plot_confusion_matrix(cm           = cm,                  # confusion matrix created by
                                                                  # sklearn.metrics.confusion_matrix
                              normalize    = True,                # show proportions
                              target_names = y_labels_vals,       # list of names of the classes
                              title        = best_estimator_name) # title of graph
    
        Citiation
        ---------
        http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html
    
        """
        import matplotlib.pyplot as plt
        import numpy as np
        import itertools
    
        accuracy = np.trace(cm) / np.sum(cm).astype('float')
        misclass = 1 - accuracy
    
        if cmap is None:
            cmap = plt.get_cmap('Blues')
    
        plt.figure(figsize=(8, 6))
        plt.imshow(cm, interpolation='nearest', cmap=cmap)
        plt.title(title)
        plt.colorbar()
    
        if target_names is not None:
            tick_marks = np.arange(len(target_names))
            plt.xticks(tick_marks, target_names, rotation=45)
            plt.yticks(tick_marks, target_names)
    
        if normalize:
            cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
    
    
        thresh = cm.max() / 1.5 if normalize else cm.max() / 2
        for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
            if normalize:
                plt.text(j, i, "{:0.4f}".format(cm[i, j]),
                         horizontalalignment="center",
                         color="white" if cm[i, j] > thresh else "black")
            else:
                plt.text(j, i, "{:,}".format(cm[i, j]),
                         horizontalalignment="center",
                         color="white" if cm[i, j] > thresh else "black")
    
    
        plt.tight_layout()
        plt.ylabel('True label')
        plt.xlabel('Predicted label\naccuracy={:0.4f}; misclass={:0.4f}'.format(accuracy, misclass))
        plt.show()
    

    看起来像这样

    【讨论】:

    • 这将如何与超过 3 个类一起工作?
    • @Kurt,当你有超过 3 个类时,你只会得到一个更大的混淆矩阵(即,如果你有 10 个类,则为 10 x 10 矩阵)。注意:target_names 变量用于将类的标签传递给函数,需要与混淆矩阵中的类数相匹配。
    【解决方案5】:
    from sklearn import model_selection
    test_size = 0.33
    seed = 7
    X_train, X_test, y_train, y_test = model_selection.train_test_split(feature_vectors, y, test_size=test_size, random_state=seed)
    
    from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix
    
    model = LogisticRegression()
    model.fit(X_train, y_train)
    result = model.score(X_test, y_test)
    print("Accuracy: %.3f%%" % (result*100.0))
    y_pred = model.predict(X_test)
    print("F1 Score: ", f1_score(y_test, y_pred, average="macro"))
    print("Precision Score: ", precision_score(y_test, y_pred, average="macro"))
    print("Recall Score: ", recall_score(y_test, y_pred, average="macro")) 
    
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    from sklearn.metrics import confusion_matrix
    
    def cm_analysis(y_true, y_pred, labels, ymap=None, figsize=(10,10)):
        """
        Generate matrix plot of confusion matrix with pretty annotations.
        The plot image is saved to disk.
        args: 
          y_true:    true label of the data, with shape (nsamples,)
          y_pred:    prediction of the data, with shape (nsamples,)
          filename:  filename of figure file to save
          labels:    string array, name the order of class labels in the confusion matrix.
                     use `clf.classes_` if using scikit-learn models.
                     with shape (nclass,).
          ymap:      dict: any -> string, length == nclass.
                     if not None, map the labels & ys to more understandable strings.
                     Caution: original y_true, y_pred and labels must align.
          figsize:   the size of the figure plotted.
        """
        if ymap is not None:
            y_pred = [ymap[yi] for yi in y_pred]
            y_true = [ymap[yi] for yi in y_true]
            labels = [ymap[yi] for yi in labels]
        cm = confusion_matrix(y_true, y_pred, labels=labels)
        cm_sum = np.sum(cm, axis=1, keepdims=True)
        cm_perc = cm / cm_sum.astype(float) * 100
        annot = np.empty_like(cm).astype(str)
        nrows, ncols = cm.shape
        for i in range(nrows):
            for j in range(ncols):
                c = cm[i, j]
                p = cm_perc[i, j]
                if i == j:
                    s = cm_sum[i]
                    annot[i, j] = '%.1f%%\n%d/%d' % (p, c, s)
                elif c == 0:
                    annot[i, j] = ''
                else:
                    annot[i, j] = '%.1f%%\n%d' % (p, c)
        cm = pd.DataFrame(cm, index=labels, columns=labels)
        cm.index.name = 'Actual'
        cm.columns.name = 'Predicted'
        fig, ax = plt.subplots(figsize=figsize)
        sns.heatmap(cm, annot=annot, fmt='', ax=ax)
        #plt.savefig(filename)
        plt.show()
    
    cm_analysis(y_test, y_pred, model.classes_, ymap=None, figsize=(10,10))
    

    使用https://gist.github.com/hitvoice/36cf44689065ca9b927431546381a3f7

    请注意,如果您使用rocket_r,它会反转颜色,并且看起来更自然更好,如下所示:

    【讨论】:

    • 谢谢,但您提到的rocket_r 选项是什么?
    • 在函数 sns.heatmap() 中,传递参数 cmap='rocket_r' 以获得与比例相反的颜色
    【解决方案6】:
        from sklearn.metrics import confusion_matrix
        import seaborn as sns
        import matplotlib.pyplot as plt
        model.fit(train_x, train_y,validation_split = 0.1, epochs=50, batch_size=4)
        y_pred=model.predict(test_x,batch_size=15)
        cm =confusion_matrix(test_y.argmax(axis=1), y_pred.argmax(axis=1))  
        index = ['neutral','happy','sad']  
        columns = ['neutral','happy','sad']  
        cm_df = pd.DataFrame(cm,columns,index)                      
        plt.figure(figsize=(10,6))  
        sns.heatmap(cm_df, annot=True)
    

    【讨论】:

      【解决方案7】:

      要添加到@akilat90 关于sklearn.metrics.plot_confusion_matrix 的更新:

      您可以直接在sklearn.metrics 中使用ConfusionMatrixDisplay 类,而无需将分类器传递给plot_confusion_matrix。它还具有display_labels 参数,允许您根据需要指定图中显示的标签。

      ConfusionMatrixDisplay 的构造函数不提供对绘图进行更多额外自定义的方法,但您可以在调用其 plot() 方法后通过 ax_ 属性访问 matplotlib 轴对象。我添加了第二个示例来展示这一点。

      我发现必须对大量数据重新运行分类器才能生成带有plot_confusion_matrix 的图,这很烦人。我正在根据预测数据生成其他图,所以我不想每次都浪费时间重新预测。这也是解决该问题的简单方法。

      例子:

      from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
      
      cm = confusion_matrix(y_true, y_preds, normalize='all')
      cmd = ConfusionMatrixDisplay(cm, display_labels=['business','health'])
      cmd.plot()
      

      使用ax_的示例:

      cm = confusion_matrix(y_true, y_preds, normalize='all')
      cmd = ConfusionMatrixDisplay(cm, display_labels=['business','health'])
      cmd.plot()
      cmd.ax_.set(xlabel='Predicted', ylabel='True')
      
      

      【讨论】:

      • 这太棒了 - 谢谢!问题:您可以为轴标签自定义“真实标签”和“预测标签”值吗?
      • 我之前没有意识到这一点,但是您可以通过cmd.ax_ 访问 matplotlib 轴对象,这可以对绘图进行大量控制。要自定义轴标签,请使用以下内容:cmd.ax_.set(xlabel='foo', ylabel='bar')。我会更新我的答案。
      • 非常感谢!但看起来 cmd.ax_.set 禁用了 display_labels=['business','health']
      • 我也收到了AttributeError: 'ConfusionMatrixDisplay' object has no attribute 'ax_'
      • 啊,你是对的!感谢您指出这些事情。在找到解决方案的兴奋中,我在更新中犯了一些错误。请查看最新版本,现在应该可以使用了。
      【解决方案8】:

      给定模型,validx,validy。在其他答案的大力帮助下,这符合我的需求。

      sklearn.metrics.plot_confusion_matrix

      import matplotlib.pyplot as plt
      fig, ax = plt.subplots(figsize=(26,26))
      sklearn.metrics.plot_confusion_matrix(model, validx, validy, ax=ax, cmap=plt.cm.Blues)
      ax.set(xlabel='Predicted', ylabel='Actual', title='Confusion Matrix Actual vs Predicted')
      

      【讨论】:

        【解决方案9】:

        使用ConfusionMatrixDisplay 有一个非常简单的方法可以做到这一点。它支持display_labels,可用于显示绘图标签

        import numpy as np
        from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
        np.random.seed(0)
        y_true = np.random.randint(0,3, 100)
        y_pred = np.random.randint(0,3, 100)
        
        labels = ['cat', 'dog', 'rat']
        
        cm = confusion_matrix(y_true, y_pred)
        ConfusionMatrixDisplay(cm, display_labels=labels).plot()
        #plt.savefig("Confusion_Matrix.png")
        

        输出:

        参考:ConfusionMatrixDisplay

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2021-03-13
          • 2016-01-31
          • 2020-10-24
          • 2019-05-22
          • 2020-01-22
          • 2019-10-15
          • 2021-01-17
          相关资源
          最近更新 更多