【问题标题】:How can I get the probability of each predicted values in Python in multiclass?如何在多类中获得 Python 中每个预测值的概率?
【发布时间】:2021-07-31 03:33:00
【问题描述】:

我目前正在研究疾病预测机器学习模型。我在模型中使用了随机森林分类器,现在我试图获得预测值的概率,但代码给了我一个错误。在这个程序中,我想具体得到“每个”预测的概率。例如,我输入了症状来预测疾病,预测的疾病是“过敏”。然后,我希望我的程序将预测疾病“过敏”的概率显示为百分比,但程序会给出错误,例如“分类指标无法处理多类和未知目标的混合”。我想我需要使用混淆矩阵来显示概率,但它也给出了关于多类问题的相同错误。更清楚地说,我只想将每个预测值的概率显示为“百分比”。例如,过敏性疾病的概率是 90% 等。我该如何做到这一点,如何解决我的问题?

这里是相关代码:

p=pickle_model.predict([[22,8,50,9,20,47,50,38,0,0,0]])
actual=np.array((22,8,50,9,20,47,50,38,0,0,0))
pred=pickle_model.predict_proba([[p,0,0,0,0,0,0,0,0,0,0]])

在下面的代码块中:

from sklearn.metrics import confusion_matrix
import sklearn.metrics as mt
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
print(accuracy_score(actual, p, normalize=True, sample_weight=None))

我得到错误:

ValueError                                Traceback (most recent call last)
<ipython-input-69-e8980bf68410> in <module>
      3 from sklearn.metrics import accuracy_score
      4 from sklearn.metrics import precision_score
----> 5 print(accuracy_score(actual, p, normalize=True, sample_weight=None))
      6 #precision, recall, fscore, support =
      7 #score(y_test, p)

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

~\anaconda3\lib\site-packages\sklearn\metrics\_classification.py in accuracy_score(y_true, y_pred, normalize, sample_weight)
    200 
    201     # Compute accuracy for each possible representation
--> 202     y_type, y_true, y_pred = _check_targets(y_true, y_pred)
    203     check_consistent_length(y_true, y_pred, sample_weight)
    204     if y_type.startswith('multilabel'):

~\anaconda3\lib\site-packages\sklearn\metrics\_classification.py in _check_targets(y_true, y_pred)
     81     y_pred : array or indicator matrix
     82     """
---> 83     check_consistent_length(y_true, y_pred)
     84     type_true = type_of_target(y_true)
     85     type_pred = type_of_target(y_pred)

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays)
    260     uniques = np.unique(lengths)
    261     if len(uniques) > 1:
--> 262         raise ValueError("Found input variables with inconsistent numbers of"
    263                          " samples: %r" % [int(l) for l in lengths])
    264 

ValueError: Found input variables with inconsistent numbers of samples: [11, 1]

此外,我在此代码块中遇到的另一个错误是:

ValueError                                Traceback (most recent call last)
<ipython-input-65-774dbd6b46f7> in <module>
      8 
      9 # specificity
---> 10 tn, fp, fn, tp = mt.confusion_matrix(actual, predict).ravel()
     11 specificity = tn / (tn+fp)
     12 print(specificity)

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

~\anaconda3\lib\site-packages\sklearn\metrics\_classification.py in confusion_matrix(y_true, y_pred, labels, sample_weight, normalize)
    294 
    295     """
--> 296     y_type, y_true, y_pred = _check_targets(y_true, y_pred)
    297     if y_type not in ("binary", "multiclass"):
    298         raise ValueError("%s is not supported" % y_type)

~\anaconda3\lib\site-packages\sklearn\metrics\_classification.py in _check_targets(y_true, y_pred)
     90 
     91     if len(y_type) > 1:
---> 92         raise ValueError("Classification metrics can't handle a mix of {0} "
     93                          "and {1} targets".format(type_true, type_pred))
     94  
ValueError: Classification metrics can't handle a mix of multiclass and unknown targets

【问题讨论】:

    标签: python machine-learning scikit-learn


    【解决方案1】:

    你正在寻找的函数是predict_proba()

    此函数的输入必须与您训练的模型的输入完全相同。它必须具有与函数predict() 相同的输入。

    如果您的这部分代码对您有用(因为您没有报告任何错误):

    p=pickle_model.predict([[22,8,50,9,20,47,50,38,0,0,0]])
    

    这也应该有效:

    p=pickle_model.predict_proba([[22,8,50,9,20,47,50,38,0,0,0]])
    

    注意:注意“[]”,也许你应该删除某人。

    【讨论】:

    • 我按照你说的尝试了,但它给了我错误:ValueError: Found input variables with inconsistent numbers of samples: [11, 1]。我试过这样:print(accuracy_score(actual, p, normalize=True, sample_weight=None)) 也许我应该使用其他东西而不是“实际”?
    • 问题是你的格式。请打印actualp。它们必须具有相同的形状,必须与训练模型的格式相匹配。也许重塑将是一个不错的选择。 Here有一个明确的例子
    猜你喜欢
    • 2017-06-12
    • 2021-12-04
    • 2021-02-11
    • 2019-10-19
    • 2018-04-19
    • 1970-01-01
    • 1970-01-01
    • 2016-12-04
    • 2016-12-04
    相关资源
    最近更新 更多