操作数不能与形状一起广播 (15,3) (15,) GradientBoostingClassifier答案

【问题标题】：operands could not be broadcast together with shapes (15,3) (15,) GradientBoostingClassifier操作数不能与形状一起广播 (15,3) (15,) GradientBoostingClassifier
【发布时间】：2021-07-19 16:21:00
【问题描述】：

我正在尝试从sklearn 设置GradientBoostingClassifier 给出的loss_ 函数，但它返回一个对我来说毫无意义的错误：

错误：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-228-587a565619f1> in <module>
      2 for i, pred in enumerate(clf.staged_predict(testX)):
      3     print(testY, pred)
----> 4     test_score[i] = clf.loss_(testY, pred)

~/anaconda3/lib/python3.8/site-packages/sklearn/ensemble/_gb_losses.py in __call__(self, y, raw_predictions, sample_weight)
    712 
    713         if sample_weight is None:
--> 714             return np.sum(-1 * (Y * raw_predictions).sum(axis=1) +
    715                           logsumexp(raw_predictions, axis=1))
    716         else:

ValueError: operands could not be broadcast together with shapes (15,3) (15,)

这很奇怪，因为很明显变量testY 和pred 都有(15,) 的形状。有人可以向我解释一下吗？

复制错误的代码：

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
import numpy as np

dt = load_iris(as_frame=True)
X, Y = np.array(dt.data), np.array(data.target)

trainX, testX, trainY, testY = train_test_split(X, Y, test_size=0.1)

clf = GradientBoostingClassifier(n_estimators=200).fit(trainX, trainY)

test_score = np.empty(len(clf.estimators_))
for i, pred in enumerate(clf.staged_predict(testX)):
    print(testY.shape) # (15,)
    print(pred.shape) # (15,)
    test_score[i] = clf.loss_(testY, pred) # Error here

【问题讨论】：

可能是一个实际的错误？

标签： python numpy scikit-learn shapes

【解决方案1】：

您应该应用staged_predict_proba 而不是staged_predict。损失（以及梯度）是根据预测概率计算的。

这应该适合你：

for i, pred in enumerate(clf.staged_predict_proba(testX)):
    test_score[i] = clf.loss_(testY, pred)

【讨论】：