【发布时间】:2020-01-05 20:52:39
【问题描述】:
问题是我从 Keras model.fit 历史记录中获得的 validation accuracy 报告值明显高于我从 sklearn.metrics 函数获得的 validation accuracy 指标。
我从model.fit得到的结果总结如下:
Last Validation Accuracy: 0.81
Best Validation Accuracy: 0.84
sklearn 的结果(标准化)完全不同:
True Negatives: 0.78
True Positives: 0.77
Validation Accuracy = (TP + TN) / (TP + TN + FP + FN) = 0.775
(see confusion matrix below for reference)
Edit: this calculation is incorrect, because one can not
use the normalized values to calculate the accuracy, since
it does not account for differences in the total absolute
number of points in the dataset. Thanks to the comment by desertnaut
我觉得这个问题和Sklearn metrics values are very different from Keras values这个问题有点相似 但我已经检查过这两种方法都在同一个数据池上进行验证,所以这个答案可能不适合我的情况。
此外,这个问题Keras binary accuracy metric gives too high accuracy 似乎解决了二进制交叉熵影响多类问题的方式的一些问题,但在我的情况下它可能不适用,因为它是一个真正的二进制分类问题。
这里是使用的命令:
模型定义:
inputs = Input((Tx, ))
n_e = 30
embeddings = Embedding(n_x, n_e, input_length=Tx)(inputs)
out = Bidirectional(LSTM(32, recurrent_dropout=0.5, return_sequences=True))(embeddings)
out = Bidirectional(LSTM(16, recurrent_dropout=0.5, return_sequences=True))(out)
out = Bidirectional(LSTM(16, recurrent_dropout=0.5))(out)
out = Dense(3, activation='softmax')(out)
modelo = Model(inputs=inputs, outputs=out)
modelo.summary()
模型总结:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 100) 0
_________________________________________________________________
embedding (Embedding) (None, 100, 30) 86610
_________________________________________________________________
bidirectional (Bidirectional (None, 100, 64) 16128
_________________________________________________________________
bidirectional_1 (Bidirection (None, 100, 32) 10368
_________________________________________________________________
bidirectional_2 (Bidirection (None, 32) 6272
_________________________________________________________________
dense (Dense) (None, 3) 99
=================================================================
Total params: 119,477
Trainable params: 119,477
Non-trainable params: 0
_________________________________________________________________
模型编译:
mymodel.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
模型拟合调用:
num_epochs = 30
myhistory = mymodel.fit(X_pad, y, epochs=num_epochs, batch_size=50, validation_data=[X_val_pad, y_val_oh], shuffle=True, callbacks=callbacks_list)
模型拟合日志:
Train on 505 samples, validate on 127 samples
Epoch 1/30
500/505 [============================>.] - ETA: 0s - loss: 0.6135 - acc: 0.6667
[...]
Epoch 10/30
500/505 [============================>.] - ETA: 0s - loss: 0.1403 - acc: 0.9633
Epoch 00010: val_acc improved from 0.77953 to 0.79528, saving model to modelo-10-melhor-modelo.hdf5
505/505 [==============================] - 21s 41ms/sample - loss: 0.1393 - acc: 0.9637 - val_loss: 0.5203 - val_acc: 0.7953
Epoch 11/30
500/505 [============================>.] - ETA: 0s - loss: 0.0865 - acc: 0.9840
Epoch 00011: val_acc did not improve from 0.79528
505/505 [==============================] - 21s 41ms/sample - loss: 0.0860 - acc: 0.9842 - val_loss: 0.5257 - val_acc: 0.7953
Epoch 12/30
500/505 [============================>.] - ETA: 0s - loss: 0.0618 - acc: 0.9900
Epoch 00012: val_acc improved from 0.79528 to 0.81102, saving model to modelo-10-melhor-modelo.hdf5
505/505 [==============================] - 21s 42ms/sample - loss: 0.0615 - acc: 0.9901 - val_loss: 0.5472 - val_acc: 0.8110
Epoch 13/30
500/505 [============================>.] - ETA: 0s - loss: 0.0415 - acc: 0.9940
Epoch 00013: val_acc improved from 0.81102 to 0.82152, saving model to modelo-10-melhor-modelo.hdf5
505/505 [==============================] - 21s 42ms/sample - loss: 0.0413 - acc: 0.9941 - val_loss: 0.5853 - val_acc: 0.8215
Epoch 14/30
500/505 [============================>.] - ETA: 0s - loss: 0.0443 - acc: 0.9933
Epoch 00014: val_acc did not improve from 0.82152
505/505 [==============================] - 21s 42ms/sample - loss: 0.0453 - acc: 0.9921 - val_loss: 0.6043 - val_acc: 0.8136
Epoch 15/30
500/505 [============================>.] - ETA: 0s - loss: 0.0360 - acc: 0.9933
Epoch 00015: val_acc improved from 0.82152 to 0.84777, saving model to modelo-10-melhor-modelo.hdf5
505/505 [==============================] - 21s 42ms/sample - loss: 0.0359 - acc: 0.9934 - val_loss: 0.5663 - val_acc: 0.8478
[...]
Epoch 30/30
500/505 [============================>.] - ETA: 0s - loss: 0.0039 - acc: 1.0000
Epoch 00030: val_acc did not improve from 0.84777
505/505 [==============================] - 20s 41ms/sample - loss: 0.0039 - acc: 1.0000 - val_loss: 0.8340 - val_acc: 0.8110
来自 sklearn 的混淆矩阵:
from sklearn.metrics import confusion_matrix
conf_mat = confusion_matrix(y_values, predicted_values)
预测值和金值确定如下:
preds = mymodel.predict(X_val)
preds_ints = [[el] for el in np.argmax(preds, axis=1)]
values_pred = tokenizer_y.sequences_to_texts(preds_ints)
values_gold = tokenizer_y.sequences_to_texts(y_val)
最后,我想补充一点,我已经打印出数据和所有预测错误,我相信 sklearn 值更可靠,因为它们似乎与我打印出保存的预测结果相匹配“最佳”模型。
另一方面,我无法理解指标为何会如此不同。由于它们都是非常知名的软件,因此我得出的结论是我在这里犯了错误,但我无法确定在哪里或如何。
【问题讨论】:
-
你的指标是什么:
acc在你的 keras 部分?以及如何计算预测值? -
由于我们无法访问每个类中的数据量,我们无法真正将 keras 准确性与 sklearn 混淆矩阵进行比较...我找不到有关它的文档,但从记忆中,keras 准确性是每批之间的平均准确度。例如 => 1 epoch 你有 10 巴赫。在第一批你有 80% 的准确度,模型调整权重,所以你在第二批等时有 81% 的准确度......输出准确度将小于在 epoch 结束时对所有数据计算的准确度
-
您实际上并没有显示 scikit-learn 的准确性(TP 和 TN 是不是准确性);另外,通过这种“手动”实验,您实际上可以区分 0.84 和 0.78 的精度是非常值得怀疑的。请显示您的
confusion_matrix命令的实际(即未标准化)输出;另外,使用 scikit-learnaccuracy_score方法 - 使用结果更新您的帖子 -
@PV8 metric: 如编译行所示,是'acc'
-
你计算的准确率错误(这个(TP+TN)/2到底是从哪里来的??);请参阅下面的答案。如果与 Keras 的差异仍然存在,请不要更改问题 - 这会使答案无效,这确实解决了您的方法中的问题。相反,请打开一个新问题。
标签: python machine-learning keras scikit-learn classification