【问题标题】:Understanding weights from a convolutional layer了解卷积层的权重
【发布时间】:2020-12-04 11:20:28
【问题描述】:

我正在尝试对单通道图像的磁共振图像进行语义分割。

要从 U-Net 网络获取编码器,我使用此函数:

def get_encoder_unet(img_shape, k_init = 'glorot_uniform', bias_init='zeros'):

    inp = Input(shape=img_shape)
    conv1 = Conv2D(64, (5, 5), activation='relu', padding='same', data_format="channels_last", kernel_initializer=k_init, bias_initializer=bias_init, name='conv1_1')(inp)
    conv1 = Conv2D(64, (5, 5), activation='relu', padding='same', data_format="channels_last", kernel_initializer=k_init, bias_initializer=bias_init, name='conv1_2')(conv1)
    pool1 = MaxPooling2D(pool_size=(2, 2), data_format="channels_last", name='pool1')(conv1)
    
    conv2 = Conv2D(96, (3, 3), activation='relu', padding='same', data_format="channels_last", kernel_initializer=k_init, bias_initializer=bias_init, name='conv2_1')(pool1)
    conv2 = Conv2D(96, (3, 3), activation='relu', padding='same', data_format="channels_last", kernel_initializer=k_init, bias_initializer=bias_init, name='conv2_2')(conv2)
    pool2 = MaxPooling2D(pool_size=(2, 2), data_format="channels_last", name='pool2')(conv2)

    conv3 = Conv2D(128, (3, 3), activation='relu', padding='same', data_format="channels_last", kernel_initializer=k_init, bias_initializer=bias_init, name='conv3_1')(pool2)
    conv3 = Conv2D(128, (3, 3), activation='relu', padding='same', data_format="channels_last", kernel_initializer=k_init, bias_initializer=bias_init, name='conv3_2')(conv3)
    pool3 = MaxPooling2D(pool_size=(2, 2), data_format="channels_last", name='pool3')(conv3)

    conv4 = Conv2D(256, (3, 3), activation='relu', padding='same', data_format="channels_last", kernel_initializer=k_init, bias_initializer=bias_init, name='conv4_1')(pool3)
    conv4 = Conv2D(256, (4, 4), activation='relu', padding='same', data_format="channels_last", kernel_initializer=k_init, bias_initializer=bias_init, name='conv4_2')(conv4)
    pool4 = MaxPooling2D(pool_size=(2, 2), data_format="channels_last", name='pool4')(conv4)

    conv5 = Conv2D(512, (3, 3), activation='relu', padding='same', data_format="channels_last", kernel_initializer=k_init, bias_initializer=bias_init, name='conv5_1')(pool4)
    conv5 = Conv2D(512, (3, 3), activation='relu', padding='same', data_format="channels_last", kernel_initializer=k_init, bias_initializer=bias_init, name='conv5_2')(conv5)

    return conv5,conv4,conv3,conv2,conv1,inp

它的总结是:

Model: "encoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 200, 200, 1)]     0         
_________________________________________________________________
conv1_1 (Conv2D)             (None, 200, 200, 64)      1664      
_________________________________________________________________
conv1_2 (Conv2D)             (None, 200, 200, 64)      102464    
_________________________________________________________________
pool1 (MaxPooling2D)         (None, 100, 100, 64)      0         
_________________________________________________________________
conv2_1 (Conv2D)             (None, 100, 100, 96)      55392     
_________________________________________________________________
conv2_2 (Conv2D)             (None, 100, 100, 96)      83040     
_________________________________________________________________
pool2 (MaxPooling2D)         (None, 50, 50, 96)        0         
_________________________________________________________________
conv3_1 (Conv2D)             (None, 50, 50, 128)       110720    
_________________________________________________________________
conv3_2 (Conv2D)             (None, 50, 50, 128)       147584    
_________________________________________________________________
pool3 (MaxPooling2D)         (None, 25, 25, 128)       0         
_________________________________________________________________
conv4_1 (Conv2D)             (None, 25, 25, 256)       295168    
_________________________________________________________________
conv4_2 (Conv2D)             (None, 25, 25, 256)       1048832   
_________________________________________________________________
pool4 (MaxPooling2D)         (None, 12, 12, 256)       0         
_________________________________________________________________
conv5_1 (Conv2D)             (None, 12, 12, 512)       1180160   
_________________________________________________________________
conv5_2 (Conv2D)             (None, 12, 12, 512)       2359808   
=================================================================
Total params: 5,384,832
Trainable params: 5,384,832
Non-trainable params: 0
_________________________________________________________________

我试图了解神经网络的工作原理,并且我有这段代码来显示最后一层权重和偏差的形状。

layer_dict = dict([(layer.name, layer) for layer in model.layers])

layer_name = model.layers[-1].name
#layer_name = 'conv5_2'

filter_index = 0 # Which filter in this block would you like to visualise?

# Grab the filters and biases for that layer
filters, biases = layer_dict[layer_name].get_weights()

print("Filters")
print("\tType: ", type(filters))
print("\tShape: ", filters.shape)
print("Biases")
print("\tType: ", type(biases))
print("\tShape: ", biases.shape)

有了这个输出:

Filters
    Type:  <class 'numpy.ndarray'>
    Shape:  (3, 3, 512, 512)
Biases
    Type:  <class 'numpy.ndarray'>
    Shape:  (512,)

我试图理解Filters' shape 的含义是(3, 3, 512, 512)。我认为最后一个512 是该层中filters 的数量,但是(3, 3, 512) 是什么意思? 我的图像是一个通道,所以我不明白过滤器形状中的3, 3img_shape(200, 200, 1))。

【问题讨论】:

  • 以这种方式看待它,考虑您的输入是 RGB 图像,并且当您指定 n 个特定大小的过滤器时。实际发生的是 n×3 过滤器被用于对相同的 RGB 图像进行卷积以生成 n 通道图像。这继续并在此处保持 512(filters)x512(channels)
  • @sai 我正在使用一个通道图像,我认为在最后一个卷积层Conv2D 中具有(3, 3) 的内核大小是正确的。如果我的代码仅适用于 3 通道图像,我会感到困惑,因为我不明白您为什么说 "... 实际发生的是 n×3 过滤器被用于卷积相同的 RGB 图像以产生 n 通道图片”
  • 您的代码完美适用于 1 个频道。我选择使用RGB图像解释的唯一原因是因为它在开始时更容易理解。如果在这里您的输入是 3 通道图像,那么过滤器形状将是 (5, 5, 3, 64),这意味着每个通道使用 64 组 5x5 过滤器。另外,请查看此处以获取有关尺寸tensorflow.org/api_docs/python/tf/nn/conv2d 的更多详细信息

标签: python tensorflow keras conv-neural-network


【解决方案1】:

我觉得最后的512是这一层的filter个数,但是(3, 3, 512)是什么意思呢?

表示过滤器的整体大小:它们本身就是 3D。作为conv5_2 的输入,您有 [batch, height', width', channels] 张量。在您的情况下,每个通道的过滤器大小为 3*3:您获取conv5_2 输入的每个 3x3 区域,对其应用 3x3 过滤器并获得 1 个值作为输出(请参阅animation)。但是这些 3x3 过滤器对于每个通道都是不同的(在您的情况下为 512 个)(请参阅 this 1 个通道的插图)。毕竟你想执行 Conv2D number_of_filter 次,所以你需要 512 个大小为 3x3x512 的过滤器。
深入了解 CNN architect 和特别是 Conv2D 背后的直觉的好文章(参见第 2 部分)

【讨论】:

  • 非常感谢您的回答。有一点我不明白:为什么我“...想要执行 Conv2D number_of_filtertimes”?再次感谢。
  • 这取决于您的型号。 Conv2D 的输出称为“特征图”(visualized)——这是 CNN “理解”它们接收的输入的一种方式。稍后,如果特定的特征组合出现在输出中,CNN 的最后一层(例如全连接)将这些特征与特定类别匹配。例如:更多的特征图 - 不太健壮的网络。因此,number_of_filter 选择在模型架构实施期间实现任务的最佳性能(准确性/稳健性/速度)。
  • 我没有想要分享的文章。我认为,当您已经深入了解 ML/CNN 的基础知识时,应该从源代码 - 模型的文章中学习一些特定的任务,例如语义分割。它们通常会披露一些“特征”,例如残差块或空洞卷积。