不理解 TensorFlow MNIST 指南中使用的代码答案

【问题标题】：Not understanding code used in TensorFlow MNIST guide不理解 TensorFlow MNIST 指南中使用的代码
【发布时间】：2018-01-14 13:47:02
【问题描述】：

我正在阅读MNIST TensorFlow guide，并试图很好地了解正在发生的事情。

添加了 cmets 的第一组步骤如下所示：

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

# Download the data set.
# Comprises thousands of images, each with a label.
# Our images are 28x28, so we have 784 pixels in total.
# one_hot means our labels are treated as a vector with a
# length of 10. e.g. for the number 4, it'd be
# [0, 0, 0, 0, 1, 0, 0, 0, 0, 0].
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# x isn't a specific value. It's a placeholder, a value that
# we'll input when we ask TensorFlow to run a computation.
# We want to input any number of MNIST images, each flattened
# into a 784-dimensional vector (e.g. an array made up of a
# double for each pixel, representing pixel brightness).
# Takes the form of [Image, Pixel].
x = tf.placeholder(tf.float32, [None, 784])

# Variables are modifiable tensors, which live in TensorFlow's
# graph of interacting operations. It can be used and modified
# by the computation. Model parameters are usually set as Variables.

# Weights
# Takes the form of [Pixel, Digit]
W = tf.Variable(tf.zeros([784, 10]))

# Biases
# Takes the form of [Digit]
b = tf.Variable(tf.zeros([10]))

y = tf.nn.softmax(tf.matmul(x, W) + b)

所以现在我试图分解最后一行以弄清楚发生了什么。

他们提供了这个图表：

忽略softmax 步骤，并忽略添加的偏差，所以只看最上面一行：

(W1,1 * x1) + (W1,2 * x2) + (W1,3 * x3).

由于 x 现在是一维的，我假设它特定于特定图像，因此 x 值是该图像中的每个像素。因此我们有：

(Weight of 1st pixel for 1st digit * value of 1st pixel) + (Weight of 1st pixel for 2nd digit * value of 2nd pixel) + (Weight of 1st pixel for 3rd digit * value of 3rd pixel)

这似乎不对。权重张量的第一维代表像素，x 张量的第二维代表像素，这意味着我们将不同像素的值相乘……这对我来说没有任何意义。

我是不是误会了什么？

【问题讨论】：

标签： tensorflow machine-learning neural-network mnist softmax

【解决方案1】：

这个模型非常简单，可能不值得深入讨论，但你的结论是不正确的。像素值永远不会相乘。这是一个线性模型：

tf.matmul(x, W) + b

...天真地假设图像是一堆独立的像素。每个像素乘以对应于 10 个类别的不同权重。换句话说，这个线性层为每个(pixel, class) 对分配一个权重。这直接对应于它的形状：[784, 10]（为简单起见，我忽略了偏差项）。

作为这种乘法的结果，最终的 10 长度向量包含每个类别的分数。每个分数都考虑了每个像素，更准确地说，它是所有像素值的加权总和。然后得分进入损失函数以将输出与基本事实进行比较，以便在下一次迭代中我们可以在正确的方向上调整这些权重。

虽然很简单，但还是比较合理的做法。

【讨论】：