Keras 何时以及如何计算每批样本的指标？答案

【问题标题】：When and How Keras calculate metrics for each batch of samples?Keras 何时以及如何计算每批样本的指标？
【发布时间】：2021-02-22 23:13:29
【问题描述】：

我看到 Keras 自定义指标是如何工作的，并且在指标函数中的 tf.print 和 model.fit 的回调打印之间的计算不匹配。

import tensorflow as tf  # tf2.4.1
import numpy as np
model = tf.keras.models.Sequential(
    tf.keras.layers.Dense(1, input_shape=(1,))
)
def my_metric_fn(y_true, y_pred):
    squared_difference = tf.square(y_true - y_pred)
    loss =  tf.reduce_mean(squared_difference, axis=-1)
    tf.print(y_true.shape, y_pred.shape, loss, tf.reduce_mean(squared_difference))
    return loss
model.compile(optimizer='adam', loss='mean_squared_error', metrics=[my_metric_fn])
x = np.random.rand(4,1)
y = x ** 2
history = model.fit(x=x, y=y, batch_size=2, epochs=2)
print(history.history)

输出（格式化以获得更好的可读性）

Epoch 1/2
TensorShape([2, 1]) TensorShape([2, 1]) [9.79962078e-06 0.0534314588] 0.02672063
1/2 [==============>...............] - ETA: 0s - loss: 0.0267 - my_metric_fn: 0.0267
TensorShape([2, 1]) TensorShape([2, 1]) [0.0397406667 0.179955378] 0.109848022
2/2 [==============================] - 0s 7ms/step - loss: 0.0544 - my_metric_fn: 0.0544

Epoch 2/2
TensorShape([2, 1]) TensorShape([2, 1]) [0.0392204635 0.0521505736] 0.0456855185
1/2 [==============>...............] - ETA: 0s - loss: 0.0457 - my_metric_fn: 0.0457
TensorShape([2, 1]) TensorShape([2, 1]) [0.177408844 2.45939535e-08] 0.088704437
2/2 [==============================] - 0s 5ms/step - loss: 0.0600 - my_metric_fn: 0.0600
{'loss': [0.06828432530164719, 0.06719497591257095], 'my_metric_fn': [0.06828432530164719, 0.06719497591257095]}

在上面的输出中查看一个批次的打印损失。

Epoch 1/2 1/2 tf.print：0.02672063，model.fit：0.0267。好的。
纪元 1/2 2/2 tf.print：0.109848022，但 model.fit：0.0544。不行。

我如何理解这些匹配和不匹配？ 0.0544 是从哪里来的？

【问题讨论】：

有什么我想念的吗？
非常感谢您的帮助。我发现它需要更多的时间来调查并找出原因和答案。答案可能是深入了解model.fit 方法的详细报告。讨论了一个 github 问题 github.com/tensorflow/tensorflow/issues/47366。在一切搞清楚之前，我想保留这个问题，但没有公认的答案，这样每个人都有机会深入研究这个问题。

标签： python tensorflow machine-learning keras deep-learning

【解决方案1】：

在 keras 中，训练损失/度量是在每个 epoch 结束时计算的，作为每个批次中损失/度量的平均值。所以在你的情况下：

EPOCH 1: (0.02672063 + 0.109848022) / 2 = 0.068284326
EPOCH 2: (0.0456855185 + 0.088704437) / 2 = 0.06719497775

对应：

history.history['loss'] ==> [0.06828432530164719, 0.06719497591257095]

【讨论】：

是的，我知道。 0.0544 是怎么来的？谢谢！
@Watanabe.N 你的 tf 版本是什么？
@Watanabe.N 抱歉，它是 2.4.1（正如你所报告和我怀疑的那样）......使用 TF 2.3.0，这不会发生，因为你可以在这里看到：colab.research.google.com/drive/…跨度>
啊，真的。我读了你的笔记本，这对我来说很有意义。我在 colab 上使用 tf2.3.1 确认了我的环境。在我使用 tf2.4.1 的情况下，(0.02672063+0.109848022)/2 = 0.068284326，不等于 0.0544。这很奇怪。我要提出问题。