【发布时间】:2021-02-20 07:11:24
【问题描述】:
我最近开始阅读 Andrew W. Trask 的 Grokking Deep Learning 一书并实现了 CNN,效果很好,但后来我尝试添加更多隐藏的 CNN 层但失败了,我只是无法获得适合 CNN 的尺寸反向传播。
我的代码如下:
for iteration in range(iterations):
'''
images: (1000, 3, 32, 32)
kernel_rows, kernel_cols, num_colors = 4,4,3
num_kernels_1, num_kernels_2 = 15, 30
hidden_size = ((input_rows - 2*kernel_rows) * (input_cols - 2*kernel_cols)) * num_kernels_2
The size the matrix has that is the output after doing 2 convolutions
(thats why its 2*kernel_rows and 2*kernel_cols)
kernels_1 = (kernel_rows*kernel_cols * num_colors, num_kernels_1)
kernels_2 = (kernel_rows*kernel_cols * num_kernels_1, num_kernels_2)
weights_1 = (hidden_size,100)
weights_2 = (100,30)
weights_3 = (30,10)
'''
sample_size = len(images)
C_0 = convolution(images, input_rows, input_cols, kernel_rows, kernel_cols)
C_1 = tanh(C_0 @ kernels_1)
C_1_flattened = C_1.reshape(sample_size, -1)
C_1 = C_1.reshape(sample_size, -1, (input_rows - kernel_rows), (input_cols - kernel_cols))
C_1 = convolution(C_1, C_1.shape[2], C_1.shape[3], kernel_rows, kernel_cols)
C_2 = tanh(C_1 @ kernels_2)
C_2 = C_2.reshape(sample_size, -1)
Z_1 = C_2 @ weights_1
A_1 = tanh(Z_1)
Z_2 = A_1 @ weights_2
A_2 = tanh(Z_2)
Z_3 = A_2 @ weights_3
A_3 = softmax(Z_3)
delta_A_3 = (labels - A_3) / len(images)
delta_A_2 = (delta_A_3 @ weights_3.T) * tanh2deriv(A_2)
delta_A_1 = (delta_A_2 @ weights_2.T) * tanh2deriv(A_1)
delta_C_2 = (delta_A_1 @ weights_1.T) * tanh2deriv(C_2)
k_update_2 = C_1.reshape(kernel_rows*kernel_cols*num_kernels_1,-1) @ delta_C_2.reshape(-1, num_kernels_2)
delta_C_1 = (delta_C_2.reshape(sample_size, -1, num_kernels_2) @ kernels_2.T) * tanh2deriv(C_1)
k_update_1 = C_0.reshape(kernel_rows*kernel_cols*num_colors, -1) @ delta_C_1.reshape(-1, num_kernels_1)
cost = np.sum((labels - A_3)**2) / len(images)
weights_3 += alpha * (A_3.T @ delta_A_3)
weights_2 += alpha * (A_2.T @ delta_A_2)
weights_1 += alpha * (A_1.T @ delta_A_1)
kernels_2 -= alpha * k_update_2
kernels_1 -= alpha * k_update_1
print(str(cost)[:8])
有问题的行是我计算 k_update_1 的那一行,其中 C_0.reshape(kernel_rows*kernel_cols*num_colors, -1) 的形状为 (48, 784000),delta_C_1.reshape(-1, num_kernels_1) 的形状为 (9216000, 15),我正在尝试使用形状 (48,15) 显然不相加。
辅助函数是:
def convolution(data, input_rows, input_cols, kernel_rows, kernel_cols):
sects = []
for row_start in range(input_rows - kernel_rows):
for col_start in range(input_cols - kernel_cols):
section = get_image_section(data,
row_start,
row_start + kernel_rows,
col_start,
col_start + kernel_cols)
sects.append(section)
expanded_input = np.concatenate(sects, axis = 1)
es = expanded_input.shape
return expanded_input.reshape(es[0], es[1], -1)
和:
def get_image_section(layer, row_from, row_to, col_from, col_to):
section = layer[:,:, row_from:row_to, col_from:col_to]
return np.expand_dims(section, axis = 1)
【问题讨论】:
标签: python deep-learning neural-network conv-neural-network backpropagation