ValueError：分类指标无法处理未知目标和二元目标的混合？答案

【问题标题】：ValueError: Classification metrics can't handle a mix of unknown and binary targets?ValueError：分类指标无法处理未知目标和二元目标的混合？
【发布时间】：2020-04-03 18:31:14
【问题描述】：

我很确定我的随机森林模型正在运行。当我查看所做的预测以及测试集中的实际类时，它们非常匹配。第一部分是我对分类数据进行编码：

Y_train[Y_train == 'Blue'] = 0.0
Y_train[Y_train == 'Green'] = 1.0
Y_test[Y_test == 'Blue'] = 0.0
Y_test[Y_test == 'Green'] = 1.0

rf = RandomForestRegressor(n_estimators=50)
rf.fit(X_train, Y_train)
predictions = rf.predict(X_test)

for i in range(len(predictions)):
    predictions[i] = predictions[i].round()

print(predictions)
print(Y_test)

print(confusion_matrix(Y_test, predictions))

当我运行这段代码时，我成功打印了predictions 和Y_test：

[1. 1. 1. 0. 1. 0. 0. 1. 1. 1. 1. 0. 0. 0. 1. 0. 1. 0. 1. 0. 0. 1. 0. 1.
 1. 0. 1. 1. 1. 0. 1. 0. 1. 1. 0. 0. 0. 0. 1. 1. 0. 1. 0. 1. 1. 0. 1. 0.
 0. 0. 0. 0. 1. 1. 0. 1. 1. 1. 1. 1. 1. 0. 0. 1. 0. 0. 1. 0. 1. 1. 1. 0.
 0. 1. 0. 1. 1. 1. 1. 0. 0. 0. 1. 1. 1. 1. 1. 1. 0. 0. 0. 0. 1. 1. 0. 1.
 0. 0. 0. 0.]
615    1
821    1
874    1
403    0
956    1
      ..
932    1
449    0
339    0
191    0
361    0
Name: Colour, Length: 100, dtype: object

如您所见，它们完美匹配，因此模型正在运行。我遇到的问题是最后一部分，当我尝试在 scikit-learn 中使用 confusion_matrix() 函数时，我收到此错误：

    Traceback (most recent call last):
  File "G:\Work\Colours.py", line 101, in <module>
    Main()
  File "G:\Work\Colours.py", line 34, in Main
    RandForest(X_train, Y_train, X_test, Y_test)
  File "G:\Work\Colours.py", line 97, in RandForest
    print(confusion_matrix(Y_test, predictions))
  File "C:\Users\Me\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\metrics\classification.py", line 253, in confusion_matrix
    y_type, y_true, y_pred = _check_targets(y_true, y_pred)
  File "C:\Users\Me\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\metrics\classification.py", line 81, in _check_targets
    "and {1} targets".format(type_true, type_pred))
ValueError: Classification metrics can't handle a mix of unknown and binary targets

我可以对这两个数据集做些什么以使confusion_matrix() 函数不会引发任何类型错误？

编辑 - predictions 和 Y_test 的形状相同，(100,)

【问题讨论】：

不要使用0.0 和1.0。例如：Y_train[Y_train == 'Blue'] = 0 或只是 Y_train = Y_train == 'Green'
@QuangHoang 没有解决问题，谢谢：/

标签： python pandas machine-learning scikit-learn

【解决方案1】：

设法通过编码这样的分类数据来修复它：

for i in range(len(Y_train)):
    if Y_train.iloc[i] == 'Blue':
        Y_train.iloc[i] = 0.0
    else:
        Y_train.iloc[i] = 1.0

for i in range(len(Y_test)):
    if Y_test.iloc[i] == 'Blue':
        Y_test.iloc[i] = 0.0
    else:
        Y_test.iloc[i] = 1.0

如果有人能告诉我为什么这会解决问题，那会很有帮助。

编辑 - 我找到了我遇到问题的真正原因。我使用的是回归模型而不是分类模型。愚蠢的错误。这一切都可以通过使用RandomForestClassifier() 而不是RandomForestRegressor() 来避免。

【讨论】：

【解决方案2】：

您必须比较具有相同维度的矩阵，因此如果预测包含 1 列和 850 行的矩阵（例如），则 Y_test 必须是 1 列和 850 行的矩阵。

打印（confusion_matrix（Y_test[1]，预测））

【讨论】：

当我这样做时，我得到了一个KeyError: 1。
Y_test和predictions的形状都是(100,)