【问题标题】:Sklearn SVM - how to get a list of the wrong predictions?Sklearn SVM - 如何获取错误预测列表?
【发布时间】:2019-02-12 00:38:19
【问题描述】:

我不是专家用户。我知道我可以得到混淆矩阵,但是我想得到一个被错误分类的行的列表,以便在分类后研究它们。

在 stackoverflow 上我发现了这个Can I get a list of wrong predictions in SVM score function in scikit-learn,但我不确定是否理解了所有内容。

这是一个示例代码。

# importing necessary libraries
from sklearn import datasets
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split

# loading the iris dataset
iris = datasets.load_iris()

# X -> features, y -> label
X = iris.data
y = iris.target

# dividing X, y into train and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0)

# training a linear SVM classifier
from sklearn.svm import SVC
svm_model_linear = SVC(kernel = 'linear', C = 1).fit(X_train, y_train)
svm_predictions = svm_model_linear.predict(X_test)

# model accuracy for X_test  
accuracy = svm_model_linear.score(X_test, y_test)

# creating a confusion matrix
cm = confusion_matrix(y_test, svm_predictions)

要遍历行并找到错误的行,建议的解决方案是:

predictions = clf.predict(inputs)
for input, prediction, label in zip(inputs, predictions, labels):
  if prediction != label:
    print(input, 'has been classified as ', prediction, 'and should be ', label) 

我不明白什么是“输入”/“输入”。如果我将此代码改编为我的代码,如下所示:

for input, prediction, label in zip (X_test, svm_predictions, y_test):
  if prediction != label:
    print(input, 'has been classified as ', prediction, 'and should be ', label)

我得到:

[6.  2.7 5.1 1.6] has been classified as  2 and should be  1

第 6 行是错误的行吗? 6.后面的数字是多少?我问这个是因为我在比这个更大的数据集上使用相同的代码,所以我想确保我做的是正确的事情。 我没有发布其他数据集,因为不幸的是我不能,但问题是我得到了这样的东西:

  (0, 253)  0.5339655767137572
  (0, 601)  0.27665553856928027
  (0, 1107) 0.7989633757962163 has been classified as  7 and should be  3
  (0, 885)  0.3034934766501018
  (0, 1295) 0.6432561790864061
  (0, 1871) 0.7029318585026516 has been classified as  7 and should be  6
  (0, 1020) 1.0 has been classified as  3 and should be  8

当我计算最后一个输出的每一行时,我得到了测试集的双倍...所以我不确定我正在分析的预测结果列表是否完全错误...

【问题讨论】:

    标签: python machine-learning scikit-learn svm


    【解决方案1】:

    第 6 行是错误的行吗? 6.后面的数字是什么?

    否 - [6. 2.7 5.1 1.6] 是实际样本(即其特征)。要获取错误行的索引,我们应该稍微修改for 循环:

    for idx, input, prediction, label in zip(enumerate(X_test), X_test, svm_predictions, y_test):
        if prediction != label:
            print("No.", idx[0], 'input,',input, ', has been classified as', prediction, 'and should be', label) 
    

    现在的结果是

    No. 37 input, [ 6.   2.7  5.1  1.6] , has been classified as 2 and should be 1
    

    这意味着X_test[37],也就是[ 6. 2.7 5.1 1.6],已经被我们的SVM预测为2,而它的真实标签是1。

    让我们确认一下这个读数:

    X_test[37]
    # array([ 6. ,  2.7,  5.1,  1.6])
    
    svm_predictions[37]
    # 2
    
    y_test[37]
    # 1
    

    此结果与您的混淆矩阵cm 一致,它确实显示X_test 中只有一个错误分类的样本:

    cm
    # result:
    array([[13,  0,  0],
           [ 0, 15,  1],
           [ 0,  0,  9]], dtype=int64)
    

    一个更优雅的for 循环,因为枚举包括样本本身,可能是:

    for idx, prediction, label in zip(enumerate(X_test), svm_predictions, y_test):
        if prediction != label:
            print("Sample", idx, ', has been classified as', prediction, 'and should be', label) 
    

    给了

    Sample (37, array([ 6. ,  2.7,  5.1,  1.6])) , has been classified as 2 and should be 1
    

    【讨论】:

      【解决方案2】:

      如果您只想获取错误分类实例的列表,您可以执行以下操作:

      # with the following sentence you can get a mask of the items bad classified
      mask = np.logical_not(np.equal(y_test, predictions))
      # Now you can use the mask to see the elements bad classified:
      print(f"Elements wrong classified: {X_test[mask]}")
      print(f"Prediction by the model for each of those elements: {predictions[mask]}")
      print(f"Actual value for each of those elements: {np.asarray(y_test)[mask]}")
      

      【讨论】:

        猜你喜欢
        • 2014-04-03
        • 1970-01-01
        • 2019-10-19
        • 1970-01-01
        • 2017-06-24
        • 2013-04-30
        • 2018-09-20
        • 2016-05-12
        • 2020-01-09
        相关资源
        最近更新 更多