为 keras 模型生成混淆矩阵 - 情感分析答案

【问题标题】：Generating confusion matrix for keras model - Sentiment analysis为 keras 模型生成混淆矩阵 - 情感分析
【发布时间】：2020-05-08 17:22:52
【问题描述】：

我正在使用 LSTM 测试情绪分析模型。我需要在分类器结果中添加一个混淆矩阵，如果可能的话，还要添加精度、召回率和 F-Measure 值。到目前为止，我只有准确性。 Movie_reviews 数据有 pos 和 neg 标签。

import tensorflow as td
from tensorflow import keras
import numpy as np
from keras.layers import LSTM,Embedding,Dense
from keras.models import Sequential
from keras.preprocessing.sequence import pad_sequences

data = keras.datasets.imdb
(train_data, train_labels), (test_data, test_labels) = data.load_data(num_words=88000)
word_index = data.get_word_index()
word_index = {k:(v+3) for k, v in word_index.items()}
word_index["<PAD>"] = 0
word_index["<START>"] = 1
word_index["<UNK>"] = 2
word_index["<UNUSED>"] = 3
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
train_data = keras.preprocessing.sequence.pad_sequences(train_data, value=word_index["<PAD>"], padding="post", maxlen=250)
maxlen = 250
X_train_pad = pad_sequences(train_data,maxlen=maxlen)
X_test_pad = pad_sequences(test_data,maxlen=maxlen)
max_features = max([max(x) for x in X_train_pad] + 
               [max(x) for x in X_test_pad]) + 1
max_features
model = Sequential()
model.add(Embedding(max_features, 128))
model.add(LSTM(64, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
          optimizer='adam',
          metrics=['accuracy'])
model.summary()
x_val = train_data[:10000]
x_train = train_data[10000:]
y_val = train_labels[:10000]
y_train = train_labels[10000:]
fitModel = model.fit(x_train, y_train, epochs=1, batch_size=512, validation_data=(x_val,y_val),verbose=1)
from sklearn.metrics import confusion_matrix
y_pred = model.predict(test_data)
confusion_matrix = confusion_matrix(test_data, np.rint(test_labels))

使用上面的代码生成混淆矩阵，我收到以下错误：

    confusion_matrix = confusion_matrix(test_data, np.rint(test_labels))
  File "/usr/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 250, in confusion_matrix
    y_type, y_true, y_pred = _check_targets(y_true, y_pred)
  File "/usr/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 81, in _check_targets
    "and {1} targets".format(type_true, type_pred))
ValueError: Classification metrics can't handle a mix of multiclass-multioutput and binary targets

我们如何准确地得到混淆矩阵？

【问题讨论】：

标签： python tensorflow keras sentiment-analysis confusion-matrix

【解决方案1】：

你只需要：

y_pred = (model.predict(test_data).ravel()>0.5)+0 # predict and get class (0 if pred < 0.5 else 1)
confusion_matrix = confusion_matrix(test_labels, y_pred)

【讨论】：