Gradient-Based Learning Applied to Document Recognition LeNet-5部分阅读笔记

《Gradient-Based Learning Applied to Document Recognition》

Background knowledge

1. Gradient-based learning

2. Back propagation: gradients can be computed efficiently by propagation from the outputto the input对误差进行反向传播，更新权值

Gradient-Based Learning Applied to Document Recognition LeNet-5部分阅读笔记

X_nis a vector representing the output of the module. W_nis thevector of tunable parameters in the module a subset of W and X_n is the module’s input vector as well as the previous module’soutput vector

3. ConvolutionalNetworks

Convolutional Networks combine three architecturalideas to ensure some degree of shift, scale and distortion invariance:local receptive fields,shared weights (or weight replication) andspatial or temporalsub-sampling.

卷积网络的三个要点：局部感受野、权值共享、下采样。

localreceptive fields : Each unit in a layer receives inputsfrom a set of units located in a small neighborhood in the previous layer. 局部感受野，每个局部单元共享权值。

featuremap: Units in a layer are organized in planeswithin which all the units share the same set of weights. The set of outputs ofthe units in such a plane is called a feature map.共享权值的各局部单元输出形成一个feature map。

sub-sampling: The receptive field of each unit is a 2 by 2 area in the previouslayer’s corresponding feature map. Units are non-overlapping. sub-samplingperforms a local averaging and reduces the spatial solution of the feature map.

下采样，减小卷积层的尺寸，通过求局部平均降低特征图的分辨率，并且降低了输出对平移和形变的敏感度。

4. LossFunction

Maximum Likelihood Estimation criterion (MLE)

Gradient-Based Learning Applied to Document Recognition LeNet-5部分阅读笔记

maximum a posteriori criterion (MAP) posterior∝likelihood×prior (Beyasian Theory)
Gradient-Based Learning Applied to Document Recognition LeNet-5部分阅读笔记

损失函数：损失函数最小相当于似然函数取得最大值

贝叶斯方法：求后验最大似然函数

LeNet-5网络架构

Input: a 32×32pixel image输入32×32像素的图像

7 layers 一共7层

C1: 5×5 unit, 6 feature maps 卷积层，28×28（32-(5-1)=28）

trainable parameters: (5×5+1)×6=156; connections: (5×5+1)×28×28×6=122304

S2: 2×2 unit, 6 feature maps. 下采样层，14×14 (28/2=14)

The four inputs to a unit in S2 are added, thenmultiplied by a trainable coefficient, and added to a trainable bias.

trainable parameters: (1+1)×6=12; connections: (2×2+1)×14×14×6=5880

C3: 5×5 unit, 16 feature maps 卷积层，10×10（14-(5-1)=10）

Each unit in each feature map is connected toseveral 5×5 neighborhoods at identical locations in asubset of S2’s feature map.

Gradient-Based Learning Applied to Document Recognition LeNet-5部分阅读笔记

C3的每个feature map并不与S2所有feature map 相连接

trainableparameters: (25*3+1)*6+(25*4+1)*9+(25*6+1)=1516; connections: 1516×10×10=151600

S4: 2×2 unit, 16 feature maps. 下采样层，5×5(10/2=5)

trainable parameters:2×16=32; connections: (2×2+1)×5×5×16=2000

C5: 5×5 unit, 120 feature maps. 卷积层，1×1，与S4全连接

C5 is labeled as a convolutional layer, insteadof a fully connected layer, because if LeNet-5 input were made bigger witheverything else kept constant, the feature map dimension would be larger than 1x1但仍是卷积层

trainable connections: (5×5×16+1)×120=48120

F6: fully connected to C5, 84 units

全连接层，84个单元，先计算与上一层点积，加上bias，再传入sigmoid函数

trainable parameters: (120+1)×84=10164

output layer: Euclidean RadialBasis Function units (RBF) for each class

输出层，每类一个输出，输出该类对应的RBF