【问题标题】:scikit multilearn: accuracy_score ValueError: multiclass-multioutput is not supportedscikit multilearn:accuracy_score ValueError:不支持多类多输出
【发布时间】:2020-06-05 15:09:01
【问题描述】:

我想预测一次可以包含多个标签的样本(多标签分类)。于是我使用了scikit-multilearn库,成功拟合了一个分类器,甚至可以预测测试数据。它只是无法输出分类器的准确性。

我的数据(最多 1100 行):

依赖变量(我预测的变量)是最后 4 个:N/xN、Sex、MaturityCType。其余的是独立变量。

我所说的准确度是分类器与预测所有标签的接近程度。

代码如下:

import numpy as np
import pandas as pd
from scipy import sparse
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from skmultilearn.problem_transform import BinaryRelevance

# Prepare data
df = pd.read_csv("Data_Numeric.csv")
# remove crab_id for now
del df['Crab_id']

# independent vars: the rest
# dependent vars: N/xN, Gender, Maturity, CType
# n_samples = 1100
# n_features = 6
# n_labels = 4
X = df.iloc[:, :6].values
y = df.iloc[:, 6:df.shape[1]].astype(np.int64).values

X = sparse.csr_matrix(X)
y = sparse.csr_matrix(y, dtype=np.int64)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# generate model
classifier = BinaryRelevance(SVC())

# train
classifier.fit(X_train, y_train)

# predict
y_pred = classifier.predict(X_test)
y_pred_array = y_pred.toarray()

# my_data = X_test[0:4, :]
# my_data[0] = [64.7, 46, 12, 13, 0, 0]
# my_data_prediction = classifier.predict(my_data).toarray()
# my_data_true = y_test[0:4, :].toarray()

# error here
score = accuracy_score(y_test.toarray(), y_pred.toarray())

错误是

Traceback (most recent call last):
  File "<input>", line 42, in <module>
  File "/home/f4ww4z/anaconda3/envs/ayah/lib/python3.7/site-packages/sklearn/metrics/_classification.py", line 185, in accuracy_score
    y_type, y_true, y_pred = _check_targets(y_true, y_pred)
  File "/home/f4ww4z/anaconda3/envs/ayah/lib/python3.7/site-packages/sklearn/metrics/_classification.py", line 97, in _check_targets
    raise ValueError("{0} is not supported".format(y_type))
ValueError: multiclass-multioutput is not supported

y_test

>>> y_test
<330x4 sparse matrix of type '<class 'numpy.longlong'>'
    with 578 stored elements in Compressed Sparse Row format>

y_test.toarray(),形状为330x4

y_pred

>>> y_pred
<330x4 sparse matrix of type '<class 'numpy.longlong'>'
    with 408 stored elements in Compressed Sparse Column format>

y_pred.toarray():

如何正确查看分类器的准确率?

【问题讨论】:

  • 您需要 (1) 定义多类精度对您的意义 (2​​) 编写代码。对于(1),它可能是从正确猜测所有标签到正确猜测 top n 的任何内容。
  • 您的数据是什么样的?共享数据的sn-p。以及 y_true 长什么样子?
  • @SergeyBushmanov Flika205 我已经在描述中添加了它们。

标签: python machine-learning scikit-learn multilabel-classification


【解决方案1】:
from sklearn.model_selection import cross_validate, KFold

clf = BinaryRelevance(SVC())
k_fold = KFold(n_splits=5, shuffle=True, random_state=42)

scores = cross_validate(clf, X_train, y_train, cv=k_fold, scoring=['accuracy'])

scores = cross_val_score(clf, X_train, y_train, cv=5)

通过使用交叉验证方法,您可以获得5个精度分数,然后遵守它们。

您可以使用MultiOoutputClassifier和RandomForestClassifififer

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_validate, KFold
from sklearn.multioutput import MultiOutputClassifier
clf=MultiOutputClassifier(RandomForestClassifier(random_state=42,class_weight="balanced"))
k_fold = KFold(n_splits=5, shuffle=True, random_state=42)

scores = cross_validate(clf, X_train_tf, y_train, cv=k_fold, scoring=['f1_weighted'])

也许这将为您提供帮助:)

【讨论】:

    猜你喜欢
    • 2014-11-30
    • 2016-01-06
    • 2019-08-24
    • 2018-11-07
    • 2020-02-14
    • 2017-08-28
    • 2017-10-19
    • 2020-05-31
    相关资源
    最近更新 更多