如何在多类中获得 Python 中每个预测值的概率？答案

【问题标题】：How can I get the probability of each predicted values in Python in multiclass?如何在多类中获得 Python 中每个预测值的概率？
【发布时间】：2021-07-31 03:33:00
【问题描述】：

我目前正在研究疾病预测机器学习模型。我在模型中使用了随机森林分类器，现在我试图获得预测值的概率，但代码给了我一个错误。在这个程序中，我想具体得到“每个”预测的概率。例如，我输入了症状来预测疾病，预测的疾病是“过敏”。然后，我希望我的程序将预测疾病“过敏”的概率显示为百分比，但程序会给出错误，例如“分类指标无法处理多类和未知目标的混合”。我想我需要使用混淆矩阵来显示概率，但它也给出了关于多类问题的相同错误。更清楚地说，我只想将每个预测值的概率显示为“百分比”。例如，过敏性疾病的概率是 90% 等。我该如何做到这一点，如何解决我的问题？

这里是相关代码：

p=pickle_model.predict([[22,8,50,9,20,47,50,38,0,0,0]])

actual=np.array((22,8,50,9,20,47,50,38,0,0,0))

pred=pickle_model.predict_proba([[p,0,0,0,0,0,0,0,0,0,0]])

在下面的代码块中：

from sklearn.metrics import confusion_matrix
import sklearn.metrics as mt
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
print(accuracy_score(actual, p, normalize=True, sample_weight=None))

我得到错误：

ValueError                                Traceback (most recent call last)
<ipython-input-69-e8980bf68410> in <module>
      3 from sklearn.metrics import accuracy_score
      4 from sklearn.metrics import precision_score
----> 5 print(accuracy_score(actual, p, normalize=True, sample_weight=None))
      6 #precision, recall, fscore, support =
      7 #score(y_test, p)

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

~\anaconda3\lib\site-packages\sklearn\metrics\_classification.py in accuracy_score(y_true, y_pred, normalize, sample_weight)
    200 
    201     # Compute accuracy for each possible representation
--> 202     y_type, y_true, y_pred = _check_targets(y_true, y_pred)
    203     check_consistent_length(y_true, y_pred, sample_weight)
    204     if y_type.startswith('multilabel'):

~\anaconda3\lib\site-packages\sklearn\metrics\_classification.py in _check_targets(y_true, y_pred)
     81     y_pred : array or indicator matrix
     82     """
---> 83     check_consistent_length(y_true, y_pred)
     84     type_true = type_of_target(y_true)
     85     type_pred = type_of_target(y_pred)

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays)
    260     uniques = np.unique(lengths)
    261     if len(uniques) > 1:
--> 262         raise ValueError("Found input variables with inconsistent numbers of"
    263                          " samples: %r" % [int(l) for l in lengths])
    264 

ValueError: Found input variables with inconsistent numbers of samples: [11, 1]

此外，我在此代码块中遇到的另一个错误是：

ValueError                                Traceback (most recent call last)
<ipython-input-65-774dbd6b46f7> in <module>
      8 
      9 # specificity
---> 10 tn, fp, fn, tp = mt.confusion_matrix(actual, predict).ravel()
     11 specificity = tn / (tn+fp)
     12 print(specificity)

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

~\anaconda3\lib\site-packages\sklearn\metrics\_classification.py in confusion_matrix(y_true, y_pred, labels, sample_weight, normalize)
    294 
    295     """
--> 296     y_type, y_true, y_pred = _check_targets(y_true, y_pred)
    297     if y_type not in ("binary", "multiclass"):
    298         raise ValueError("%s is not supported" % y_type)

~\anaconda3\lib\site-packages\sklearn\metrics\_classification.py in _check_targets(y_true, y_pred)
     90 
     91     if len(y_type) > 1:
---> 92         raise ValueError("Classification metrics can't handle a mix of {0} "
     93                          "and {1} targets".format(type_true, type_pred))
     94  
ValueError: Classification metrics can't handle a mix of multiclass and unknown targets

【问题讨论】：

标签： python machine-learning scikit-learn

【解决方案1】：

你正在寻找的函数是predict_proba()。

此函数的输入必须与您训练的模型的输入完全相同。它必须具有与函数predict() 相同的输入。

如果您的这部分代码对您有用（因为您没有报告任何错误）：

p=pickle_model.predict([[22,8,50,9,20,47,50,38,0,0,0]])

这也应该有效：

p=pickle_model.predict_proba([[22,8,50,9,20,47,50,38,0,0,0]])

注意：注意“[]”，也许你应该删除某人。

【讨论】：

我按照你说的尝试了，但它给了我错误：ValueError: Found input variables with inconsistent numbers of samples: [11, 1]。我试过这样：print(accuracy_score(actual, p, normalize=True, sample_weight=None)) 也许我应该使用其他东西而不是“实际”？
问题是你的格式。请打印actual 和p。它们必须具有相同的形状，必须与训练模型的格式相匹配。也许重塑将是一个不错的选择。 Here有一个明确的例子