了解 softmax 输出层的目标数据答案

【问题标题】：Understanding target data for softmax output layer了解 softmax 输出层的目标数据
【发布时间】：2020-02-28 11:00:43
【问题描述】：

我找到了一些用于 MNIST 手写字符分类问题的示例代码。代码开头如下：

import tensorflow as tf

# Load in the data
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
print("x_train.shape:", x_train.shape)

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])
# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
# Train the model
r = model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=10)

查看代码，网络的输出层似乎由十个节点组成。如果网络在训练后工作正常，那么（适当的）十个输出中的一个将具有非常接近于 1 的激活值，而其余的应该具有非常接近于零的激活值。

我知道训练集包含 60000 个示例模式。因此，我假设目标输出数据（y_train）将是一个形状为 60000x10 的 2D numpy 数组。我决定仔细检查并执行print(y_train.shape) 并非常惊讶地看到它说(60000,)... 通常你会期望看到目标模式的大小与输出层中的节点数相同。我心想，“好吧，如果我们只需要一个目标，那么显然 softmax 是一个不寻常的特例”......我的下一个想法是 - 我怎么能从任何文档中知道这一点？......到目前为止我还没有找到任何东西。

【问题讨论】：

可能，y-vectors 包含基本事实的整数标签，并且在 model.fit-call 的某处被单热编码
也许是真的，但我的问题是关于文档的......我觉得仅仅通过遇到一个工作示例来推断这些信息感到不舒服。

标签： python tensorflow softmax

【解决方案1】：

我认为您的搜索方向错误。这不是因为softmax。 Softmax 函数（不是层）接收 n 个值并产生 n 个值。这是因为sparse_categorical_crossentropy 丢失。

在official document 中，您可以检查是否应该将目标值作为标签整数提供。您还可以看到，使用 (60000,10) 形状作为目标值的损失完全相同，即 CategoricalCrossentropy 损失。

您可以根据提供的数据格式选择要使用的损失。由于 MNIST 数据被标记为整数而不是 one-hot 编码，因此本教程使用 SparseCategoricalCrossentropy 损失。

【讨论】：