如何将特定的 keras 层权重定义为不可训练？答案

【问题标题】：How to define a specific keras layer weight as non-trainable?如何将特定的 keras 层权重定义为不可训练？
【发布时间】：2021-09-02 09:56:15
【问题描述】：

假设我们有一个三层的神经网络：输入 > 隐藏 > 输出，并考虑隐藏层和输出层之间的权重是：W, b 其中 W 是形状 (N, M) 的矩阵。默认情况下，W 和 b 的所有组件在 keras 中都设置为可训练的。我知道如何将整个 W 或 b 设置为不可训练，如下面的链接所示：

How to set parameters in keras to be non-trainable?

我想要的是能够仅将 W 的特定组件（例如）设置为不可训练。例如，如果：

W = [[W11, W12]
     [W21, W22]]

可以改写成：

W = [W1, W2] with W1 = [W11, W12] and W2 = [W21, W22]

并且所有 W1 和 W2 都是 tf.Variable 类型，

如何将例如 W1 设置为不可训练？

我寻找了一些其他主题，但没有一个可以帮助我得到我想要的。一些链接示例如下：

链接 1：https://keras.io/guides/transfer_learning/

链接 2：https://github.com/tensorflow/tensorflow/issues/47597

谁能帮我解决这个问题？

提前谢谢你

【问题讨论】：

标签： python tensorflow variables keras neural-network

【解决方案1】：

张量 W 存储为单个 tf.Variable（不是四个变量 w11、w12、w21、w22）并且 tf.Variable.trainable 控制整个张量，而不是子张量。更糟糕的是，在 keras 层中，所有变量都具有相同的可训练属性，因为它们由 tf.keras.layers.Layer.trainable 属性控制。

要做你想做的事，你需要两个变量 W1 和 W2，每个变量都包装在一个层的不同实例中。您将每一层应用于输入，得到一半的答案。然后你可以 concat 得到完整的答案。

【讨论】：

感谢您的回答。您所描述的是我脑海中的解决方案之一，但出于令人信服的原因我不想这样做。

【解决方案2】：

您可以在keras 中创建自己的图层。这将帮助您自定义层内的权重，例如，它们是否可训练。

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # suppress Tensorflow messages
import tensorflow as tf
from keras.layers import *
from keras.models import *

# Your custom layer
class Linear(Layer):
    def __init__(self, units=32,**kwargs):
        super(Linear, self).__init__(**kwargs)
        self.units = units

    def build(self, input_shape):
        self.w = self.add_weight(
            shape=(input_shape[-1], self.units),
            initializer="random_normal",
            trainable=True,
        )
        self.b = self.add_weight(
            shape=(self.units,), initializer="random_normal", trainable=False
        )

    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b

在Linear 中，权重w 是可训练的，而偏差b 则不是。在这里，我正在为虚拟数据创建一个training loop，以可视化权重更新。

batch_size=10
input_shape=(batch_size,5,5) 


## model
model = Sequential()
model.add(Input(shape=input_shape))
model.add(Linear(units=4,name='my_linear_layer'))
model.add(Dense(1))


## dummy dataset
x = tf.random.normal(input_shape) # dummy input
y = tf.ones((batch_size,1)) # dummy output

## loss functions and optimizer
loss_fn = tf.keras.losses.MeanSquaredError()
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-2)


### training loop 
epochs = 3
for epoch in range(epochs):
  print("\nStart of epoch %d" % (epoch,))

  tf.print(model.get_layer('my_linear_layer').get_weights())

  # Open a GradientTape to record the operations run
  # during the forward pass, which enables auto-differentiation.
  with tf.GradientTape() as tape:

    # Run the forward pass of the layer.
    # The operations that the layer applies
    # to its inputs are going to be recorded
    # on the GradientTape.
    logits = model(x, training=True)  # Logits for this minibatch

    # Compute the loss value for this minibatch.
    loss_value = loss_fn(y, logits)

    # Use the gradient tape to automatically retrieve
    # the gradients of the trainable variables with respect to the loss.
  grads = tape.gradient(loss_value, model.trainable_weights)

  # Run one step of gradient descent by updating
  # the value of the variables to minimize the loss.
  optimizer.apply_gradients(zip(grads, model.trainable_weights))

此循环返回以下结果，

Start of epoch 0
[array([[ 0.08920084, -0.04294993,  0.06111819,  0.08334437],
       [-0.0369432 , -0.05014499,  0.0305218 , -0.07486793],
       [-0.01227043,  0.09460627, -0.0560123 ,  0.01324316],
       [-0.00255878,  0.00214959, -0.02924518,  0.04721532],
       [-0.05532415, -0.02014978, -0.06785563, -0.07330619]],
      dtype=float32),
 array([ 0.02154647,  0.05153348, -0.00128291, -0.06794706], dtype=float32)]

Start of epoch 1
[array([[ 0.08961578, -0.04327399,  0.06152926,  0.08325274],
       [-0.03829437, -0.04908974,  0.02918325, -0.07456956],
       [-0.01417133,  0.09609085, -0.05789544,  0.01366292],
       [-0.00236284,  0.00199657, -0.02905108,  0.04717206],
       [-0.05536905, -0.02011472, -0.06790011, -0.07329627]],
      dtype=float32),
 array([ 0.02154647,  0.05153348, -0.00128291, -0.06794706], dtype=float32)]

Start of epoch 2
[array([[ 0.09001605, -0.04358549,  0.06192534,  0.08316355],
       [-0.03960795, -0.04806747,  0.02788337, -0.07427685],
       [-0.01599812,  0.09751251, -0.05970317,  0.01406999],
       [-0.00217021,  0.00184666, -0.02886046,  0.04712913],
       [-0.05540781, -0.02008455, -0.06793848, -0.07328764]],
      dtype=float32),
 array([ 0.02154647,  0.05153348, -0.00128291, -0.06794706], dtype=float32)]

正如您所见，当权重 w 更新时，偏差 b 保持不变。

【讨论】：

感谢您的回答。我同意这是一种方式，但在我的问题中，它即将冻结权重矩阵的向量分量（而不是偏差）。正如上面@Yaoshiang 所解释的，W 存储为单个 tf.Variable（不是四个变量 w11、w12、w21、w22），而不是向量 W1 和 W2 的两个变量。

【解决方案3】：

所以我目前正在尝试解决类似的问题。您需要做的是首先使用 keras 的功能 API。然后将所有您希望可训练的权重放入一层，将所有您希望不可训练的权重放入另一层。将前一层输入到这两个层中。然后你可以做的是使用 tensorflow 连接层将这些层重新组合在一起。假设您有一个包含 5 个神经元的隐藏层，其中 3 个您希望它们可训练，2 个您希望它们不可训练。

    X = Dense(5, activation='relu')(X) #previous layer

    Y = Dense(3, activation='relu',name='trainable_layer')(X) 
    Z = Dense(2, activation='relu',name='non_trainable_layer')(X)
    Z.trainable = False

    X = Concatenate()([Y, Z])

    X = Dense(5, activation='relu)(X) #layer after layer with mixed trainable weights

【讨论】：

如果您有新问题，请点击按钮提出问题。如果有助于提供上下文，请包含指向此问题的链接。 - From Review