《Gradient-Based Learning Applied to Document Recognition》
Background knowledge
1. Gradient-based learning
2. Back propagation: gradients can be computed efficiently by propagation from the outputto the input对误差进行反向传播,更新权值
Xn is a vector representing the output of the module. Wnis thevector of tunable parameters in the module a subset of W and Xn is the module’s input vector as well as the previous module’soutput vector
3. ConvolutionalNetworks
Convolutional Networks combine three architecturalideas to ensure some degree of shift, scale and distortion invariance:local receptive fields,shared weights (or weight replication) andspatial or temporalsub-sampling.
卷积网络的三个要点:局部感受野、权值共享、下采样。
localreceptive fields : Each unit in a layer receives inputsfrom a set of units located in a small neighborhood in the previous layer. 局部感受野,每个局部单元共享权值。
featuremap: Units in a layer are organized in planeswithin which all the units share the same set of weights. The set of outputs ofthe units in such a plane is called a feature map.共享权值的各局部单元输出形成一个feature map。
sub-sampling: The receptive field of each unit is a 2 by 2 area in the previouslayer’s corresponding feature map. Units are non-overlapping. sub-samplingperforms a local averaging and reduces the spatial solution of the feature map.
下采样,减小卷积层的尺寸,通过求局部平均降低特征图的分辨率,并且降低了输出对平移和形变的敏感度。
4. LossFunction
Maximum Likelihood Estimation criterion (MLE)
maximum a posteriori criterion (MAP) posterior∝likelihood×prior (Beyasian Theory)
损失函数:损失函数最小相当于似然函数取得最大值
贝叶斯方法:求后验最大似然函数
LeNet-5网络架构
Input: a 32×32pixel image输入32×32像素的图像
7 layers 一共7层
C1: 5×5 unit, 6 feature maps 卷积层,28×28(32-(5-1)=28)
trainable parameters: (5×5+1)×6=156; connections: (5×5+1)×28×28×6=122304
S2: 2×2 unit, 6 feature maps. 下采样层,14×14 (28/2=14)
The four inputs to a unit in S2 are added, thenmultiplied by a trainable coefficient, and added to a trainable bias.
trainable parameters: (1+1)×6=12; connections: (2×2+1)×14×14×6=5880
C3: 5×5 unit, 16 feature maps 卷积层,10×10(14-(5-1)=10)
Each unit in each feature map is connected toseveral 5×5 neighborhoods at identical locations in asubset of S2’s feature map.
C3的每个feature map并不与S2所有feature map 相连接
trainableparameters: (25*3+1)*6+(25*4+1)*9+(25*6+1)=1516; connections: 1516×10×10=151600
S4: 2×2 unit, 16 feature maps. 下采样层,5×5(10/2=5)
trainable parameters:2×16=32; connections: (2×2+1)×5×5×16=2000
C5: 5×5 unit, 120 feature maps. 卷积层,1×1,与S4全连接
C5 is labeled as a convolutional layer, insteadof a fully connected layer, because if LeNet-5 input were made bigger witheverything else kept constant, the feature map dimension would be larger than 1x1但仍是卷积层
trainable connections: (5×5×16+1)×120=48120
F6: fully connected to C5, 84 units
全连接层,84个单元,先计算与上一层点积,加上bias,再传入sigmoid函数
trainable parameters: (120+1)×84=10164
output layer: Euclidean RadialBasis Function units (RBF) for each class
输出层,每类一个输出,输出该类对应的RBF