输入归一化,尤其是均值减法,确实是一个重要的预处理步骤,在实践中通常是使 SGD 收敛所必需的。
当你说它在教程中几乎没有使用时你错了:它几乎无处不在。比如tensorflow的CIFAR-10教程规范化图片here。
唯一常见的例外是 MNIST,不幸的是,它也是无所不在的 CNN-101 教程,可能给人留下持久的印象,即图像归一化毕竟是可选的。
我用 tensorflow 的 deep MNIST tutorial 做了一个小实验,比较了有和没有归一化的结果。
标准实现产生:
step 0, training accuracy 0.1
step 100, training accuracy 0.94
step 200, training accuracy 0.88
step 300, training accuracy 0.9
step 400, training accuracy 0.92
step 500, training accuracy 0.92
step 600, training accuracy 0.92
step 700, training accuracy 0.98
step 800, training accuracy 0.98
step 900, training accuracy 0.9
step 1000, training accuracy 0.94
step 1100, training accuracy 0.98
step 1200, training accuracy 0.96
step 1300, training accuracy 0.94
step 1400, training accuracy 0.98
step 1500, training accuracy 1
step 1600, training accuracy 0.94
step 1700, training accuracy 0.96
step 1800, training accuracy 1
step 1900, training accuracy 0.96
test accuracy 0.974
添加归一化时
x_image = tf.map_fn(lambda frame: tf.image.per_image_standardization(frame), x_image)
我得到了
step 0, training accuracy 0.1
step 100, training accuracy 0.86
step 200, training accuracy 0.92
step 300, training accuracy 0.86
step 400, training accuracy 0.94
step 500, training accuracy 0.98
step 600, training accuracy 0.94
step 700, training accuracy 0.96
step 800, training accuracy 1
step 900, training accuracy 0.92
step 1000, training accuracy 0.92
step 1100, training accuracy 0.98
step 1200, training accuracy 0.98
step 1300, training accuracy 0.96
step 1400, training accuracy 0.96
step 1500, training accuracy 1
step 1600, training accuracy 0.98
step 1700, training accuracy 0.96
step 1800, training accuracy 0.94
step 1900, training accuracy 1
test accuracy 0.974
最后,在这两种情况下,我在 2000 步后得到了完全相同的 (!) 相同的测试精度。所以图像归一化在这里不会降低性能,但不会增加太多。
真正的问题是,是什么让 MNIST 数据集如此特别以至于它无法从图像标准化中受益。这可能是由于图像的性质(大部分是恒定的),或者是由于该数据集(LeNet 和变体)上通常使用的网络足够浅,能够处理非零均值数据。