在密集的 Keras 层中绑定自动编码器权重答案

【问题标题】：Tying Autoencoder Weights in a Dense Keras Layer在密集的 Keras 层中绑定自动编码器权重
【发布时间】：2025-09-01 10:20:02
【问题描述】：

我正在尝试在 Keras 中创建一个自定义的 Dense 层，以在自动编码器中绑定权重。我已经尝试在卷积层here 中执行此操作的示例，但似乎某些步骤不适用于 Dense 层（另外，代码来自两年多前）。

通过绑定权重，我希望解码层使用编码层的转置权重矩阵。 this article（第 5 页）也采用了这种方法。以下是文章的相关引述：

这里，我们选择编码和解码激活函数都是 sigmoid 函数，并且只考虑绑定权重情况，其中 W ′ = W^T （其中 W^T 是个 W 的转置）作为大多数现有的深度学习方法做。

在上面的引用中，W是编码层中的权重矩阵，W'（等于W的转置）是解码层的权重矩阵。

dense 层我没有改变太多。我在构造函数中添加了一个tied_to 参数，它允许您传递要绑定到的层。唯一的其他变化是build 函数，sn-p 如下：

def build(self, input_shape):
    assert len(input_shape) >= 2
    input_dim = input_shape[-1]

    if self.tied_to is not None:
        self.kernel = K.transpose(self.tied_to.kernel)
        self._non_trainable_weights.append(self.kernel)
    else:
        self.kernel = self.add_weight(shape=(input_dim, self.units),
                                      initializer=self.kernel_initializer,
                                      name='kernel',
                                      regularizer=self.kernel_regularizer,
                                      constraint=self.kernel_constraint)
    if self.use_bias:
        self.bias = self.add_weight(shape=(self.units,),
                                    initializer=self.bias_initializer,
                                    name='bias',
                                    regularizer=self.bias_regularizer,
                                    constraint=self.bias_constraint)
    else:
        self.bias = None
    self.input_spec = InputSpec(min_ndim=2, axes={-1: input_dim})
    self.built = True

下面是__init__ 方法，这里唯一的变化是添加了tied_to 参数。

def __init__(self, units,
             activation=None,
             use_bias=True,
             kernel_initializer='glorot_uniform',
             bias_initializer='zeros',
             kernel_regularizer=None,
             bias_regularizer=None,
             activity_regularizer=None,
             kernel_constraint=None,
             bias_constraint=None,
             tied_to=None,
             **kwargs):
    if 'input_shape' not in kwargs and 'input_dim' in kwargs:
        kwargs['input_shape'] = (kwargs.pop('input_dim'),)
    super(Dense, self).__init__(**kwargs)
    self.units = units
    self.activation = activations.get(activation)
    self.use_bias = use_bias
    self.kernel_initializer = initializers.get(kernel_initializer)
    self.bias_initializer = initializers.get(bias_initializer)
    self.kernel_regularizer = regularizers.get(kernel_regularizer)
    self.bias_regularizer = regularizers.get(bias_regularizer)
    self.activity_regularizer = regularizers.get(activity_regularizer)
    self.kernel_constraint = constraints.get(kernel_constraint)
    self.bias_constraint = constraints.get(bias_constraint)
    self.input_spec = InputSpec(min_ndim=2)
    self.supports_masking = True
    self.tied_to = tied_to

call 函数未编辑，但在下面供参考。

def call(self, inputs):
    output = K.dot(inputs, self.kernel)
    if self.use_bias:
        output = K.bias_add(output, self.bias, data_format='channels_last')
    if self.activation is not None:
        output = self.activation(output)
    return output

在上面，我添加了一个条件来检查是否设置了tied_to 参数，如果是，则将层的内核设置为tied_to 层的内核的转置。

下面是用于实例化模型的代码。它是使用 Keras 的顺序 API 完成的，DenseTied 是我的自定义层。

# encoder
#
encoded1 = Dense(2, activation="sigmoid")

decoded1 = DenseTied(4, activation="sigmoid", tied_to=encoded1)

# autoencoder
#
autoencoder = Sequential()
autoencoder.add(encoded1)
autoencoder.add(decoded1)

训练模型后，下面是模型摘要和权重。

autoencoder.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_7 (Dense)              (None, 2)                 10        
_________________________________________________________________
dense_tied_7 (DenseTied)     (None, 4)                 12        
=================================================================
Total params: 22
Trainable params: 14
Non-trainable params: 8
________________________________________________________________

autoencoder.layers[0].get_weights()[0]
array([[-2.122982  ,  0.43029135],
       [-2.1772149 ,  0.16689162],
       [-1.0465667 ,  0.9828905 ],
       [-0.6830663 ,  0.0512633 ]], dtype=float32)


autoencoder.layers[-1].get_weights()[1]
array([[-0.6521988 , -0.7131109 ,  0.14814234,  0.26533198],
       [ 0.04387903, -0.22077179,  0.517225  , -0.21583867]],
      dtype=float32)

如您所见，autoencoder.get_weights() 报告的权重似乎没有绑定。

所以在展示了我的方法之后，我的问题是，这是在 Dense Keras 层中绑定权重的有效方法吗？我能够运行代码，并且它目前正在训练中。损失函数似乎也在合理地减少。我担心的是，这只会在构建模型时使它们相等，但实际上不会将它们绑定。我希望后端 transpose 函数通过引擎盖下的引用将它们联系起来，但我确信我错过了一些东西。

【问题讨论】：

“tie”是指有两个具有完全相同权重的 Dense 层吗？如果是这种情况，那么为什么不使用单个 Dense 层并将其应用到模型的不同部分？
抱歉，我已经更新了这个问题，以说明我所说的“捆绑”权重是什么意思。不幸的是，它并不像使用同一层那么简单，因为必须转置权重矩阵。
我无法测试它，但我非常有信心您的方法是正确的（尽管我不确定 self._trainable_weights.append(self.kernel) 是否是绝对必要的，因为权重 self.tied_to.kernel 在理论上已经可以训练）。我建议您在训练后检查重量，并确保它们相同。您还可以使用 Tensorboard 可视化计算图。
@JamesMchugh 我认为你根本不应该使用self._trainable_weights.append(self.kernel)，因为从自定义密集层的角度来看，这些权重是不可训练的。要么完全删除该行，要么改用self._non_trainable_weights.append(self.kernel)，以便您可以独立地从自定义密集层访问权重（即使用get_weights() 方法）。
对于任何感兴趣的人来说，问题是通过使用k.variable(k.transpose(self.kernel))，我打破了平局。我不得不改用k.transpose(self.kernel)。但是，这在尝试使用autoencoder.load_weights(file) 时确实会导致一些问题，因为self.kernel 是张量并且没有assign 方法。

标签： python keras autoencoder

【解决方案1】：

感谢米哈伊尔·柏林科夫，一个重要的说明：此代码在 Keras 下运行，但在 TF2.0 中不是以 Eager 模式运行。它运行，但训练很差。

关键点是，对象如何存储转置后的权重。 self.kernel = K.transpose(self.tied_to.kernel)

在非急切模式下，这会以正确的方式创建图表。在 Eager 模式下，这会失败，可能是因为转置变量的值存储在构建时（== 第一次调用），然后在后续调用中使用。

但是：解决方案是在构建时存储不变的变量，并将转置操作放入调用方法中。

我花了几天时间来解决这个问题，如果这对任何人有帮助，我很高兴。

【讨论】：

【解决方案2】：

所以在展示了我的方法之后，我的问题是，这是在 Dense Keras 层中绑定权重的有效方法吗？

是的，它是有效的。

我担心这只会在构建模型时使它们相等，但实际上不会将它们绑定。我希望后端转置功能通过引擎盖下的引用将它们联系起来，但我确信我错过了一些东西。

它实际上将它们绑定在一个计算图中，您可以在打印model.summary() 时检查这些可训练权重只有一个副本。此外，在训练模型后，您可以使用model.get_weights() 检查相应层的权重。构建模型时，实际上还没有权重，只是它们的占位符。

random.seed(1)

class DenseTied(Layer):
    def __init__(self, units,
                 activation=None,
                 use_bias=True,
                 kernel_initializer='glorot_uniform',
                 bias_initializer='zeros',
                 kernel_regularizer=None,
                 bias_regularizer=None,
                 activity_regularizer=None,
                 kernel_constraint=None,
                 bias_constraint=None,
                 tied_to=None,
                 **kwargs):
        self.tied_to = tied_to
        if 'input_shape' not in kwargs and 'input_dim' in kwargs:
            kwargs['input_shape'] = (kwargs.pop('input_dim'),)
        super().__init__(**kwargs)
        self.units = units
        self.activation = activations.get(activation)
        self.use_bias = use_bias
        self.kernel_initializer = initializers.get(kernel_initializer)
        self.bias_initializer = initializers.get(bias_initializer)
        self.kernel_regularizer = regularizers.get(kernel_regularizer)
        self.bias_regularizer = regularizers.get(bias_regularizer)
        self.activity_regularizer = regularizers.get(activity_regularizer)
        self.kernel_constraint = constraints.get(kernel_constraint)
        self.bias_constraint = constraints.get(bias_constraint)
        self.input_spec = InputSpec(min_ndim=2)
        self.supports_masking = True

    def build(self, input_shape):
        assert len(input_shape) >= 2
        input_dim = input_shape[-1]

        if self.tied_to is not None:
            self.kernel = K.transpose(self.tied_to.kernel)
            self._non_trainable_weights.append(self.kernel)
        else:
            self.kernel = self.add_weight(shape=(input_dim, self.units),
                                          initializer=self.kernel_initializer,
                                          name='kernel',
                                          regularizer=self.kernel_regularizer,
                                          constraint=self.kernel_constraint)
        if self.use_bias:
            self.bias = self.add_weight(shape=(self.units,),
                                        initializer=self.bias_initializer,
                                        name='bias',
                                        regularizer=self.bias_regularizer,
                                        constraint=self.bias_constraint)
        else:
            self.bias = None

        self.built = True

    def compute_output_shape(self, input_shape):
        assert input_shape and len(input_shape) >= 2
        assert input_shape[-1] == self.units
        output_shape = list(input_shape)
        output_shape[-1] = self.units
        return tuple(output_shape)

    def call(self, inputs):
        output = K.dot(inputs, self.kernel)
        if self.use_bias:
            output = K.bias_add(output, self.bias, data_format='channels_last')
        if self.activation is not None:
            output = self.activation(output)
        return output


# input_ = Input(shape=(16,), dtype=np.float32)
# encoder
#
encoded1 = Dense(4, activation="sigmoid", input_shape=(4,), use_bias=True)
decoded1 = DenseTied(4, activation="sigmoid", tied_to=encoded1, use_bias=False)

# autoencoder
#
autoencoder = Sequential()
# autoencoder.add(input_)
autoencoder.add(encoded1)
autoencoder.add(decoded1)

autoencoder.compile(optimizer="adam", loss="binary_crossentropy")

print(autoencoder.summary())

autoencoder.fit(x=np.random.rand(100, 4), y=np.random.randint(0, 1, size=(100, 4)))

print(autoencoder.layers[0].get_weights()[0])
print(autoencoder.layers[1].get_weights()[0])

【讨论】：

我在训练模型后确实使用了model.get_weights()，但权重似乎没有正确绑定。解码器的权重似乎不是解码器的转置。我还没有尝试过model.summary()，但这是一个很好的选择。当我测试这个时，我会更新你。谢谢你的回答。
我使用了model.get_weights() 和model.summary()，但似乎没有任何迹象表明权重是绑定的。
您可以尝试删除self._trainable_weights.append(self.kernel) 吗？这些不是这一层的可训练权重，而是另一层的可训练权重。我认为发生的情况是它们在图表的两个位置得到更新，这就是它们不同的原因。
我确实将其更改为self._non_trainable_weights.append(self.kernel)，但权重似乎仍然不同。如果内核未添加到这些列表中，则在使用model.get_weights() 时将不会打印。
当您添加到不可训练的权重时，您能否显示您的 model.get_weights() 和 model.summary()。另外，你能分享一下call 方法吗？