微调resnet50时如何冻结一些图层答案

【问题标题】：how to freeze some layers when fine tune resnet50微调resnet50时如何冻结一些图层
【发布时间】：2017-10-08 06:23:54
【问题描述】：

我正在尝试使用 keras 微调 resnet 50。当我冻结 resnet50 中的所有图层时，一切正常。但是，我想冻结一些 resnet50 层，而不是全部。但是当我这样做时，我得到了一些错误。这是我的代码：

base_model = ResNet50(include_top=False, weights="imagenet", input_shape=(input_size, input_size, input_channels))
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(Dense(80, activation="softmax"))

#this is where the error happens. The commented code works fine
"""
for layer in base_model.layers:
    layer.trainable = False
"""
for layer in base_model.layers[:-26]:
    layer.trainable = False
model.summary()
optimizer = Adam(lr=1e-4)
model.compile(loss="categorical_crossentropy", optimizer=optimizer, metrics=["accuracy"])

callbacks = [
    EarlyStopping(monitor='val_loss', patience=4, verbose=1, min_delta=1e-4),
    ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=2, cooldown=2, verbose=1),
    ModelCheckpoint(filepath='weights/renet50_best_weight.fold_' + str(fold_count) + '.hdf5', save_best_only=True,
                    save_weights_only=True)
    ]

model.load_weights(filepath="weights/renet50_best_weight.fold_1.hdf5")
model.fit_generator(generator=train_generator(), steps_per_epoch=len(df_train) // batch_size,  epochs=epochs, verbose=1,
                  callbacks=callbacks, validation_data=valid_generator(), validation_steps = len(df_valid) // batch_size)

错误如下：

Traceback (most recent call last):
File "/home/jamesben/ai_challenger/src/train.py", line 184, in <module> model.load_weights(filepath="weights/renet50_best_weight.fold_" + str(fold_count) + '.hdf5')
File "/usr/local/lib/python3.5/dist-packages/keras/models.py", line 719, in load_weights topology.load_weights_from_hdf5_group(f, layers)
File "/usr/local/lib/python3.5/dist-packages/keras/engine/topology.py", line 3095, in load_weights_from_hdf5_group K.batch_set_value(weight_value_tuples)
File "/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py", line 2193, in batch_set_value get_session().run(assign_ops, feed_dict=feed_dict)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 767, in run run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 944, in _run % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (128,) for Tensor 'Placeholder_72:0', which has shape '(3, 3, 128, 128)'

谁能帮我看看我应该用 resnet50 冻结多少层？

【问题讨论】：

我在回调和 model.fit_generator model.load_weights(filepath="weights/renet50_best_weight.fold_1.hdf5') 之间丢失了一条线
此错误与冻结图层无关。这是关于尝试拟合与模型输入形状不同的数据。
但是，当我冻结resnet50的所有层时，不会出现任何错误，这是否意味着输入的形状可以？
代码中的其他内容发生了变化。
感谢您的回答。这是否意味着如果我想在 resnet 上进行微调，我必须冻结 resnet 中的所有层？

标签： neural-network keras resnet

【解决方案1】：

在嵌套模型中使用load_weights() 和save_weights() 时，如果trainable 设置不一样，很容易出错。

要解决此错误，请确保在调用 model.load_weights() 之前冻结相同的图层。也就是说，如果权重文件在所有层都冻结的情况下保存，则过程将是：

重新创建模型
冻结base_model中的所有层
加载砝码
解冻您要训练的那些层（在本例中为base_model.layers[-26:]）

例如，

base_model = ResNet50(include_top=False, input_shape=(224, 224, 3))
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(Dense(80, activation="softmax"))

for layer in base_model.layers:
    layer.trainable = False
model.load_weights('all_layers_freezed.h5')

for layer in base_model.layers[-26:]:
    layer.trainable = True

根本原因：

当您调用model.load_weights() 时，（大致）通过以下步骤加载每一层的权重（在topology.py 中的函数load_weights_from_hdf5_group() 中）：

调用layer.weights获取权重张量
将每个权重张量与 hdf5 文件中对应的权重值进行匹配
调用K.batch_set_value() 将权重值分配给权重张量

如果您的模型是嵌套模型，由于第 1 步，您必须小心 trainable。

我会用一个例子来解释它。对于与上面相同的模型，model.summary() 给出：

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
resnet50 (Model)             (None, 1, 1, 2048)        23587712
_________________________________________________________________
flatten_10 (Flatten)         (None, 2048)              0
_________________________________________________________________
dense_5 (Dense)              (None, 80)                163920
=================================================================
Total params: 23,751,632
Trainable params: 11,202,640
Non-trainable params: 12,548,992
_________________________________________________________________

内部ResNet50模型在权重加载过程中被视为model的一层。在加载层resnet50时，在步骤1中，调用layer.weights相当于调用base_model.weights。将收集并返回ResNet50 模型中所有层的权重张量列表。

现在的问题是，在构建权重张量列表时，可训练的权重将排在不可训练的权重之前。在Layer类的定义中：

@property
def weights(self):
    return self.trainable_weights + self.non_trainable_weights

如果base_model 中的所有层都被冻结，权重张量将按以下顺序排列：

for layer in base_model.layers:
    layer.trainable = False
print(base_model.weights)

[<tf.Variable 'conv1/kernel:0' shape=(7, 7, 3, 64) dtype=float32_ref>,
 <tf.Variable 'conv1/bias:0' shape=(64,) dtype=float32_ref>,
 <tf.Variable 'bn_conv1/gamma:0' shape=(64,) dtype=float32_ref>,
 <tf.Variable 'bn_conv1/beta:0' shape=(64,) dtype=float32_ref>,
 <tf.Variable 'bn_conv1/moving_mean:0' shape=(64,) dtype=float32_ref>,
 <tf.Variable 'bn_conv1/moving_variance:0' shape=(64,) dtype=float32_ref>,
 <tf.Variable 'res2a_branch2a/kernel:0' shape=(1, 1, 64, 64) dtype=float32_ref>,
 <tf.Variable 'res2a_branch2a/bias:0' shape=(64,) dtype=float32_ref>,
 ...
 <tf.Variable 'res5c_branch2c/kernel:0' shape=(1, 1, 512, 2048) dtype=float32_ref>,
 <tf.Variable 'res5c_branch2c/bias:0' shape=(2048,) dtype=float32_ref>,
 <tf.Variable 'bn5c_branch2c/gamma:0' shape=(2048,) dtype=float32_ref>,
 <tf.Variable 'bn5c_branch2c/beta:0' shape=(2048,) dtype=float32_ref>,
 <tf.Variable 'bn5c_branch2c/moving_mean:0' shape=(2048,) dtype=float32_ref>,
 <tf.Variable 'bn5c_branch2c/moving_variance:0' shape=(2048,) dtype=float32_ref>]

但是，如果某些层是可训练的，则可训练层的权重张量将位于冻结层的权重张量之前：

for layer in base_model.layers[-5:]:
    layer.trainable = True
print(base_model.weights)

[<tf.Variable 'res5c_branch2c/kernel:0' shape=(1, 1, 512, 2048) dtype=float32_ref>,
 <tf.Variable 'res5c_branch2c/bias:0' shape=(2048,) dtype=float32_ref>,
 <tf.Variable 'bn5c_branch2c/gamma:0' shape=(2048,) dtype=float32_ref>,
 <tf.Variable 'bn5c_branch2c/beta:0' shape=(2048,) dtype=float32_ref>,
 <tf.Variable 'conv1/kernel:0' shape=(7, 7, 3, 64) dtype=float32_ref>,
 <tf.Variable 'conv1/bias:0' shape=(64,) dtype=float32_ref>,
 <tf.Variable 'bn_conv1/gamma:0' shape=(64,) dtype=float32_ref>,
 <tf.Variable 'bn_conv1/beta:0' shape=(64,) dtype=float32_ref>,
 <tf.Variable 'bn_conv1/moving_mean:0' shape=(64,) dtype=float32_ref>,
 <tf.Variable 'bn_conv1/moving_variance:0' shape=(64,) dtype=float32_ref>,
 <tf.Variable 'res2a_branch2a/kernel:0' shape=(1, 1, 64, 64) dtype=float32_ref>,
 <tf.Variable 'res2a_branch2a/bias:0' shape=(64,) dtype=float32_ref>,
 ...
 <tf.Variable 'bn5c_branch2b/moving_mean:0' shape=(512,) dtype=float32_ref>,
 <tf.Variable 'bn5c_branch2b/moving_variance:0' shape=(512,) dtype=float32_ref>,
 <tf.Variable 'bn5c_branch2c/moving_mean:0' shape=(2048,) dtype=float32_ref>,
 <tf.Variable 'bn5c_branch2c/moving_variance:0' shape=(2048,) dtype=float32_ref>]

顺序变化是您收到有关张量形状错误的原因。 hdf5 文件中保存的权重值与上述第 2 步中的错误权重张量匹配。冻结所有图层时一切正常的原因是因为您的模型检查点也被保存，所有图层都被冻结，因此顺序是正确的。

可能更好的解决方案：

您可以使用函数式 API 避免嵌套模型。例如，以下代码应该可以正常工作：

base_model = ResNet50(include_top=False, weights="imagenet", input_shape=(input_size, input_size, input_channels))
x = Flatten()(base_model.output)
x = Dense(80, activation="softmax")(x)
model = Model(base_model.input, x)

for layer in base_model.layers:
    layer.trainable = False
model.save_weights("all_nontrainable.h5")

base_model = ResNet50(include_top=False, weights="imagenet", input_shape=(input_size, input_size, input_channels))
x = Flatten()(base_model.output)
x = Dense(80, activation="softmax")(x)
model = Model(base_model.input, x)

for layer in base_model.layers[:-26]:
    layer.trainable = False
model.load_weights("all_nontrainable.h5")

【讨论】：