如何获得所有预测类标签的准确性答案

【问题标题】：How to get accuracy for all the predicted class labels如何获得所有预测类标签的准确性
【发布时间】：2018-10-18 05:09:31
【问题描述】：

如何找到通过运行决策树算法获得的输出的整体准确度。我能够获得活跃用户输入的前五个类别标签，但我获得了 X_train 和 Y_train 数据集的准确度使用 accuracy_score()。假设我得到了五个最佳推荐。我希望获得每个类别标签的准确度，并在这些标签的帮助下获得输出的整体准确度。请提出一些想法。

我的 python 脚本在这里：这里的事件是不同的类标签

DTC= DecisionTreeClassifier()

DTC.fit(X_train_one_hot,y_train)
print("output from DTC:")
res=DTC.predict_proba(X_test_one_hot)
new=list(chain.from_iterable(res))
#Here I got the index value of top five probabilities
index=sorted(range(len(new)), key=lambda i: new[i], reverse=True)[:5]
for i in index:
    print(event[i])

Here is the sample code which i tried to get the accuracy for the predicted class labels: 
here index is the index for the top five probability of class label and event is the different class label. 
for i in index: 
    DTC.fit(X_train_one_hot,y_train) 
    y_pred=event[i]  
    AC=accuracy_score((event,y_pred)*100) 
    print(AC)

【问题讨论】：

请分享您到目前为止尝试过的内容（代码和结果），否则您的问题太宽泛了。
我已经使用 predict_proba() 来获取类标签的概率，然后我检索了与这些概率相关的前 5 个类，现在我想要这些类中的每一个的准确性。我已经用示例代码编辑了我的帖子。
y_train 的形状是什么？
y_train的形状是-(21890,)
像这样构造一个 y_hat(class)：y_hat(class) = 1 如果期望的类在前 5 个类中，否则为 0。然后计算 y_train 到 y_hat 的准确度分数

标签： python machine-learning scikit-learn

【解决方案1】：

由于您有多级分类问题，您可以通过使用Python中的confusion_matrix函数来计算分类器的准确性。

要获得整体准确度，请将对角线上的值相加，然后将和除以样本总数。

考虑以下使用 IRIS 数据集的简单多类分类example：

import itertools
import numpy as np
import matplotlib.pyplot as plt

from sklearn import svm, datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

# import some data to play with
iris = datasets.load_iris()
X = iris.data
y = iris.target
class_names = iris.target_names

# Split the data into a training set and a test set
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

# Run classifier, using a model that is too regularized (C too low) to see
# the impact on the results
classifier = svm.SVC(kernel='linear', C=0.01)
y_pred = classifier.fit(X_train, y_train).predict(X_test)

现在要计算整体准确性，请使用混淆矩阵：

conf_mat = confusion_matrix(y_pred, y_test)
acc = np.sum(conf_mat.diagonal()) / np.sum(conf_mat)
print('Overall accuracy: {} %'.format(acc*100))

【讨论】：

y_pred = classifter.fit（x_train，y_train）.predict（x_test）这里它将计算x_test的类标签，但预测（）将只给我一个类标签，但我想要多个类标签这就是我使用 predict_proba() 的原因。我用我使用的示例代码更新了我的帖子，以计算我得到的建议的准确性。可以请您看一下吗
为什么要为给定的观察使用多个标签？ predict_proba() 为您提供将类别标签分配给样本的概率。最终根据概率为每个样本选择一个类别标签。
是的 predict_proba() 将给出概率，并且以该概率我得到前五个标签的概率，以及这些相关标签的索引，我找出这些标签的值。我需要提供前五项建议，这就是为什么我需要多个标签来进行观察。
为了计算准确度，您需要每次观察使用一个标签。如果您想使用多个，您可以查看前 1 或前 5 的准确度分数。有关详细信息，请参阅this 帖子。