Backward Propagation(BP) in Convolutional Neural Network(CNN) 卷积神经网络的反向传播[python代码]

import tensorflow as tf
# X 是 m * n 维矩阵，m 个样本，n 个特征
a0 = X
z1 = tf.matmul(a0, W1) + b1
a1 = tf.sigmoid(z1)
z2 = tf.matmul(a1, W2) + b2
a2 = tf.sigmoid(z2)

反向传播

import tensorflow as tf
# m 是 样本个数（int）
dz2 = (a2 - Y)
dz1 = tf.matmul(dz2, tf.transpose(W2)) * tf.multiply(a1, (1-a1))

dW2 = (1/m) * tf.matmul(tf.transpose(a1), dz2)
db2 = tf.reduce_mean(dz2)

dW1 = (1/m) * tf.matmul(tf.transpose(a0), dz1)
db1 = tf.reduce_mean(dz1)

2、卷积神经网络（LeNet5为例）

Backward Propagation(BP) in Convolutional Neural Network(CNN) 卷积神经网络的反向传播[python代码]
利用TensorFlow框架封装的函数conv2d(input, filter, strides, padding, use_cudnn_on_gpu=True, data_format=“NHWC”, dilations=[1, 1, 1, 1], name=None)进行卷积操作

# 28 28 1 to 12 12 6
with tf.variable_scope('layer1'):
    W1 = tf.Variable(tf.random_normal([5, 5, 1, 6], stddev=0.01))
    b1 = tf.Variable(tf.random_normal([1, 1, 1, 6], stddev=0.01))
    L1 = tf.nn.conv2d(X_img, filter=W1, strides=[1, 1, 1, 1], padding='VALID') + b1
    L1 = tf.nn.relu(L1)
    L1 = tf.nn.max_pool(L1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
# 12 12 6 to 4 4 16
with tf.variable_scope('layer2'):
    W2 = tf.Variable(tf.random_normal([5, 5, 6, 16], stddev=0.01))
    b2 = tf.Variable(tf.random_normal([1, 1, 1, 16], stddev=0.01))
    L2 = tf.nn.conv2d(L1, filter=W2, strides=[1, 1, 1, 1], padding='VALID') + b2
    L2 = tf.nn.relu(L2)
    L2 = tf.nn.max_pool(L2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
    dim = L2.get_shape()[1].value * L2.get_shape()[2].value * L2.get_shape()[3].value
    L2_flat = tf.reshape(L2, shape=[-1, dim])
# 4 4 16 to 120
with tf.variable_scope('layer3'):
    W3 = tf.Variable(tf.random_normal([dim, 120], stddev=0.01))
    b3 = tf.Variable(tf.random_normal([1, 120], stddev=0.01))
    L3 = tf.matmul(L2_flat, W3) + b3
    L3 = tf.nn.relu(L3)
# 120 to 84
with tf.variable_scope('layer4'):
    W4 = tf.Variable(tf.random_normal([120, 84], stddev=0.01))
    b4 = tf.Variable(tf.random_normal([1, 84], stddev=0.01))
    L4 = tf.matmul(L3, W4) + b4
    L4 = tf.nn.relu(L4)
# 84 to 10
with tf.variable_scope('Layer5'):
    W5 = tf.Variable(tf.random_normal([84, 10], stddev=0.01))
    b5 = tf.Variable(tf.random_normal([1, 10], stddev=0.01))
    L5 = tf.matmul(L4, W5) + b5
    L5 = tf.nn.softmax(L5, axis=1)

四、这个案例的模型（两层----一个卷积池化层、一个全连接层）

Backward Propagation(BP) in Convolutional Neural Network(CNN) 卷积神经网络的反向传播[python代码]

运用手写体识别Mnist经典数据进行分析

batchX, batchY = mnist.train.next_batch(batch_size=batchSize)

参数初始化

模型参数为（1）20个宽高为9 * 9厚度为1的卷积核 w1，（2）20个卷积池化层的偏置 b1，（3）维度为2000 * 10的全连接层权重w2，（4）10个全连接层偏置b2。初始化过程如下：

w1_ = np.random.standard_normal([9, 9, 1, 20]) * 0.1
b1_ = np.random.standard_normal([1, 20]) * 0.1
w2_ = np.random.standard_normal([2000, 10]) * 0.1
b2_ = np.random.standard_normal([1, 10]) * 0.1

五、模型前向传播

1、卷积池化层的前向传播

Backward Propagation(BP) in Convolutional Neural Network(CNN) 卷积神经网络的反向传播[python代码]

函数用sigmoid函数，函数常用relu（选用sigmoid仅仅是方便理论研究）。

池化过程用平均池化，常用最大池化max_pool（这里选用平均池化也仅仅用于理论研究）

需要先定义TensorFlow框架中的占位符（张量）X，w1，b1。
构建张量的过程如下：

with tf.name_scope('layer1'):
	# 占位符
    X = tf.placeholder(dtype=tf.float32, shape=[None, 784])
    w1 = tf.placeholder(dtype=tf.float32, shape=[9, 9, 1, 20])
    b1 = tf.placeholder(dtype=tf.float32, shape=[1, 20])
	# 卷积池化层的前向传播 ======================================
    X_img = tf.reshape(X, shape=[-1, 28, 28, 1])
    # 卷积过程
    z1 = tf.nn.conv2d(X_img, w1, strides=[1, 1, 1, 1], padding='VALID') + b1
    # 用sigmoid**函数**，**函数常用relu（选用sigmoid仅仅是方便理论研究）
    s1 = tf.sigmoid(z1)
    # 平均池化，常用最大池化max_pool（这里选用平均池化也仅仅用于理论研究）
    L1 = tf.nn.avg_pool(s1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="VALID")

运行获取卷积池化数据的过程如下

#forwardPropagation
# layer1 ++++++++++++++++++++++++++++++++++++++
feed_dict = {X: batchX, w1: w1_, b1: b1_}
# 从Tensor到numpy数据
z1_, s1_, L1_ = session.run([z1, s1, L1], feed_dict=feed_dict)
# 数据扁平化处理
L1_flat_ = np.reshape(L1_, newshape=[-1, 2000])

2、全连接层的前向传播

Backward Propagation(BP) in Convolutional Neural Network(CNN) 卷积神经网络的反向传播[python代码]
代码如下

def G(z, derive = False):
    if derive:
        return z * (1 - z)
    else:
        return 1 / (1 + np.exp(-z))
L1_flat_ = np.reshape(L1_, newshape=[-1, 2000])
z2_ = np.dot(L1_flat_, w2_) + b2_
s2_ = G(z2_)

六、反向传播前，预备知识

1、克罗内克积（kron）

Backward Propagation(BP) in Convolutional Neural Network(CNN) 卷积神经网络的反向传播[python代码]
克罗内克积的简介
 https://baike.baidu.com/item/克罗内克积/6282573?fr=aladdin

一个维度为 k * d 的矩阵M，与 2 * 2 的元素全是1的矩阵进行克罗内克积的结果，是一个 2k * 2d 的矩阵N。

2、卷积前的梯度（3 * 3 的数据被 2 * 2 卷积得 2 * 2 的数据为例）

Backward Propagation(BP) in Convolutional Neural Network(CNN) 卷积神经网络的反向传播[python代码]

这里已经详细证明，文字表述太麻烦了。仔细揣摩，给出结论。

得到的结论，一个图像的某一个通道，与代价函数值对原卷积结果的梯度，进行卷积操作，得到一个代价函数值对原卷积核在某一个厚度上的梯度

太TM难理解了，见反向传播代码品一品。

七、反向传播

1、全连接层反向传播

（1）dz2（z2梯度）

Backward Propagation(BP) in Convolutional Neural Network(CNN) 卷积神经网络的反向传播[python代码]
代码（同Hinton BP）

dz2_ = s2_ - batchY

（2）dw2（w2梯度）

Backward Propagation(BP) in Convolutional Neural Network(CNN) 卷积神经网络的反向传播[python代码]
代码（同Hinton BP）

dw2_ = (1 / m_) * np.dot(L1_flat_.T, dz2_)

（3）db2（b2梯度）

Backward Propagation(BP) in Convolutional Neural Network(CNN) 卷积神经网络的反向传播[python代码]
代码（同Hinton BP）

db2_ = np.mean(dz2_, axis=0)
db2_ = np.reshape(db2_, newshape=[1, 10])

（4）dL1_flat（L1_flat梯度）

Backward Propagation(BP) in Convolutional Neural Network(CNN) 卷积神经网络的反向传播[python代码]
代码（同Hinton BP）

dL1_flat_ = np.dot(dz2_, w2_.T)

2、卷积池化层反向传播

（1）dL1（L1梯度）

Backward Propagation(BP) in Convolutional Neural Network(CNN) 卷积神经网络的反向传播[python代码]
代码（好理解，并不难）

dL1_ = np.reshape(dL1_flat_, newshape=[-1, 10, 10, 20])

（2）ds1（s1梯度）

Backward Propagation(BP) in Convolutional Neural Network(CNN) 卷积神经网络的反向传播[python代码]
代码（注意细节，并不难）

ds1_ = np.zeros([m_, 20, 20, 20])
for k in range(20):
    one_dL1_ = np.reshape(dL1_[:, :, :, k], newshape=[-1, 10, 10])
    one_ds1_ = np.kron(one_dL1_, np.ones([2, 2])).reshape([-1, 20, 20, 1]) / 4
    ds1_[:, :, :, k] = one_ds1_[:, :, :, 0]

（3）dz1（z1梯度）

Backward Propagation(BP) in Convolutional Neural Network(CNN) 卷积神经网络的反向传播[python代码]
代码（同Hinton BP）

dz1_ = ds1_ * s1_ * (1 - s1_)

（4）db1（b1梯度）

Backward Propagation(BP) in Convolutional Neural Network(CNN) 卷积神经网络的反向传播[python代码]
代码（同Hinton BP）

db1_ = np.mean(np.reshape(dz1_, newshape=[-1, 20]), axis=0)
db1_ = np.reshape(db1_, newshape=[1, 20])

（5）dw1（w1梯度）

Backward Propagation(BP) in Convolutional Neural Network(CNN) 卷积神经网络的反向传播[python代码]

代码（难点，在这里好好悟，结合补充的知识）

构建张量的过程如下：

with tf.name_scope('convolve'):
    one_X_img = tf.placeholder(dtype=tf.float32, shape=[1, 28, 28, 1])
    convolve = tf.placeholder(dtype=tf.float32, shape=[20, 20, 1, 20])

    one_kernal = tf.nn.conv2d(one_X_img, convolve, strides=[1, 1, 1, 1], padding='VALID')
    one_kernal = tf.reshape(one_kernal, shape=[9, 9, 1, 20])

就样本的一个通道来说，每存在一个样本都会存在一次卷积，并把卷积结果求均值。因此，

通过反复喂数据，避免构建样本个数m个张量；否则m很大的情况下，构建张量的过程再强的电脑也无法解决。

dw1_ = np.zeros([9, 9, 1, 20])
for m in range(m_):
    one_X_img_ = np.reshape(X_img_[m, :, :, :], newshape=[1, 28, 28, 1])
    convolve_ = np.reshape(dz1_[m, :, :, :], newshape=[20, 20, 1, 20])、
    # 通过反复喂数据，避免构建样本个数m_个张量，否则，构建张量的过程再强的电脑也无法解决。
    one_kernal_ = session.run(one_kernal, feed_dict={one_X_img: one_X_img_, convolve: convolve_})
    dw1_ += one_kernal_
dw1_ = dw1_ / m_

八、权重更新

# update++++++++++++++++++++++++++++++++++++++++
w1_ =w1_ - learningRate * dw1_
b1_ = b1_ - learningRate * db1_
w2_ = w2_ - learningRate * dw2_
b2_ = b2_ - learningRate * db2_

运行结果

'''
000000 epoch 1 Cost 0.7285813373869118
000000 epoch 2 Cost 0.3221266726065763
000000 epoch 3 Cost 0.26143733019178567
000000 epoch 4 Cost 0.2178894449905915
000000 epoch 5 Cost 0.18599715965037983
000000 epoch 6 Cost 0.16401437470181424
000000 epoch 7 Cost 0.1460933880728079
000000 epoch 8 Cost 0.13309557309374226
000000 epoch 9 Cost 0.12221747296100305
000000 epoch 10 Cost 0.11367286562411623
accuracy 0.9695
'''