为什么使用预训练的 ResNet50 会导致训练和验证集的损失相互矛盾？答案

【问题标题】：Why using pretrained ResNet50 results in contradictory losses in training and validation set?为什么使用预训练的 ResNet50 会导致训练和验证集的损失相互矛盾？
【发布时间】：2018-10-16 00:30:19
【问题描述】：

我正在使用 ResNet50 预训练模型作为 Unet 的构建块：

def ResNet50(include_top=True, weights='imagenet',
         input_tensor=None, input_shape=None,
         pooling=None,
         classes=1000):
if weights not in {'imagenet', None}:
    raise ValueError('The `weights` argument should be either '
                     '`None` (random initialization) or `imagenet` '
                     '(pre-training on ImageNet).')

if weights == 'imagenet' and include_top and classes != 1000:
    raise ValueError('If using `weights` as imagenet with `include_top`'
                     ' as true, `classes` should be 1000')

if input_tensor is None:
    img_input = Input(shape=input_shape)
else:
    if not K.is_keras_tensor(input_tensor):
        img_input = Input(tensor=input_tensor, shape=input_shape)
    else:
        img_input = input_tensor
if K.image_data_format() == 'channels_last':
    bn_axis = 3
else:
    bn_axis = 1

x = Conv2D(64, (7, 7), strides=(2, 2), padding='same', name='conv1')(img_input)
x = BatchNormalization(axis=bn_axis, name='bn_conv1')(x)
x = Activation('relu')(x)
x = MaxPooling2D((3, 3), strides=(2, 2), padding="same")(x)

x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1))
x = identity_block(x, 3, [64, 64, 256], stage=2, block='b')
x = identity_block(x, 3, [64, 64, 256], stage=2, block='c')

x = conv_block(x, 3, [128, 128, 512], stage=3, block='a')
x = identity_block(x, 3, [128, 128, 512], stage=3, block='b')
x = identity_block(x, 3, [128, 128, 512], stage=3, block='c')
x = identity_block(x, 3, [128, 128, 512], stage=3, block='d')

x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a')
x = identity_block(x, 3, [256, 256, 1024], stage=4, block='b')
x = identity_block(x, 3, [256, 256, 1024], stage=4, block='c')
x = identity_block(x, 3, [256, 256, 1024], stage=4, block='d')
x = identity_block(x, 3, [256, 256, 1024], stage=4, block='e')
x = identity_block(x, 3, [256, 256, 1024], stage=4, block='f')

x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a')
x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b')
x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c')

# Ensure that the model takes into account
# any potential predecessors of `input_tensor`.
if input_tensor is not None:
    inputs = get_source_inputs(input_tensor)
else:
    inputs = img_input
# Create model.
model = Model(inputs, x, name='resnet50')

# load weights
if weights == 'imagenet':
    if include_top:
        weights_path = get_file('resnet50_weights_tf_dim_ordering_tf_kernels.h5',
                                WEIGHTS_PATH,
                                cache_subdir='models',
                                md5_hash='a7b3fe01876f51b976af0dea6bc144eb')
    else:
        weights_path = get_file('resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5',
                                WEIGHTS_PATH_NO_TOP,
                                cache_subdir='models',
                                md5_hash='a268eb855778b3df3c7506639542a6af')
    model.load_weights(weights_path,by_name=True)
return model

创建 Unet：

def conv_block_simple(prevlayer, filters, prefix, strides=(1, 1)):
conv = Conv2D(filters, (3, 3), padding="same", kernel_initializer="he_normal", strides=strides, name=prefix + "_conv")(prevlayer)
conv = BatchNormalization(name=prefix + "_bn")(conv)
conv = Activation('relu', name=prefix + "_activation")(conv)
return conv 

def conv_block_simple_no_bn(prevlayer, filters, prefix, strides=(1, 1)):
conv = Conv2D(filters, (3, 3), padding="same", kernel_initializer="he_normal", strides=strides, name=prefix + "_conv")(prevlayer)
conv = Activation('relu', name=prefix + "_activation")(conv)
return conv

K.clear_session()

def get_unet_resnet(input_shape):
    resnet_base = ResNet50(input_shape=input_shape, include_top=False)

    for l in resnet_base.layers:
        l.trainable = False 


    conv1 = resnet_base.get_layer("activation_1").output
    conv2 = resnet_base.get_layer("activation_10").output
    conv3 = resnet_base.get_layer("activation_22").output
    conv4 = resnet_base.get_layer("activation_40").output
    conv5 = resnet_base.get_layer("activation_49").output

    up6 = concatenate([UpSampling2D()(conv5), conv4], axis=-1)
    conv6 = conv_block_simple(up6, 256, "conv6_1")
    conv6 = conv_block_simple(conv6, 256, "conv6_2")

    up7 = concatenate([UpSampling2D()(conv6), conv3], axis=-1)
    conv7 = conv_block_simple(up7, 192, "conv7_1")
    conv7 = conv_block_simple(conv7, 192, "conv7_2")

    up8 = concatenate([UpSampling2D()(conv7), conv2], axis=-1)
    conv8 = conv_block_simple(up8, 128, "conv8_1")
    conv8 = conv_block_simple(conv8, 128, "conv8_2")

    up9 = concatenate([UpSampling2D()(conv8), conv1], axis=-1)
    conv9 = conv_block_simple(up9, 64, "conv9_1")
    conv9 = conv_block_simple(conv9, 64, "conv9_2")

    up10 = UpSampling2D()(conv9)
    conv10 = conv_block_simple(up10, 32, "conv10_1")
    conv10 = conv_block_simple(conv10, 32, "conv10_2")
    conv10 = SpatialDropout2D(0.2)(conv10)
    x = Conv2D(1, (1, 1), activation="sigmoid", name="prediction")(conv10)

    model = Model(resnet_base.input, x)

    model.summary()

    return model

我冻结了几篇论文中提出的预训练 ResNet50 层：

for l in resnet_base.layers:
    l.trainable = False

如果没有冻结，网络可以正常工作，但往往会严重过度拟合，我降低了较高的 SpatialDropout2D() 值。但是，当我冻结它时，火车损失会减少，但验证损失会以一些奇怪的高值流通，但实际上却停滞不前。我不明白，为什么冻结的网络在训练集上起作用，而在验证集上却不起作用。我认为它没有合乎逻辑的理由（以我目前的知识水平）。

我尝试玩学习率，但没有成功。

可能是什么问题？任何帮助将不胜感激。谢谢。

【问题讨论】：

您找到解决方案了吗？我在 ResNet50 上的验证准确度也有类似的问题 - 尽管训练准确度非常好，但它一直很低。请分享

标签： python machine-learning keras computer-vision resnet

【解决方案1】：

Keras 中的 BN 层存在一定的问题（Keras 的创建者热衷于设计）。问题在于，在训练期间，BN 层实际上是在学习新数据集的参数，而在验证阶段，使用的是 ResNet 训练数据集的参数（例如 imagenet 或 cifar）。这导致良好的训练准确性，但验证准确性将不存在。

你可以找到更多here

【讨论】：

感谢您的信息和链接。虽然我仍然无法检查是否是我这边的情况，但我发现它很有用，所以我放弃了。稍后如果我进行测试并且它有效，我当然会接受答案。