无法理解 tensorflow keras 层（tf.keras.layers.Layer）中“build”方法的行为答案

【问题标题】：Unable to understand the behavior of method `build` in tensorflow keras layers (tf.keras.layers.Layer)无法理解 tensorflow keras 层（tf.keras.layers.Layer）中“build”方法的行为
【发布时间】：2020-09-01 05:51:22
【问题描述】：

tensorflow keras 中的层有一个方法build，用于将权重创建推迟到您已经看到输入内容的时间。 a layer's build method

我有几个问题无法找到答案：

here据说

如果您将一个 Layer 实例指定为另一个 Layer 的属性，则外层将开始跟踪内层的权重。

跟踪层的权重是什么意思？

同一个链接还提到

我们建议在 init 方法中创建此类子层（因为子层通常具有构建方法，它们将在构建外层时构建）。

这是否意味着在运行子类（self）的build 方法时，将遍历self 的所有属性，并且发现从tf.keras.layer.Layer 的（实例）子类化的任何一个都将具有他们的build 方法会自动运行吗？

我可以运行这段代码：

class Net(tf.keras.Model):
  """A simple linear model."""

  def __init__(self):
    super(Net, self).__init__()
    self.l1 = tf.keras.layers.Dense(5)
  def call(self, x):
    return self.l1(x)

net = Net()
print(net.variables)

但不是这个：

class Net(tf.keras.Model):
  """A simple linear model."""

  def __init__(self):
    super(Net, self).__init__()
    self.l1 = tf.keras.layers.Dense(5)
  def build(self,input_shape):
    super().build()
  def call(self, x):
    return self.l1(x)

net = Net()
print(net.variables)

为什么？

【问题讨论】：

标签： tensorflow keras tensorflow2.0 keras-layer tf.keras

【解决方案1】：

我会说 build 提到的意思是，例如，当您构建一个自定义的 tf.keras.Model 时

net = Net()

然后您将获得在__init__ 中创建的所有tf.keras.layers.Layer 对象，并存储在net 中，这是一个可调用对象。在这种情况下，它将成为一个完整的对象，供 TF 稍后训练，这就是它所说的to track。下次您致电 net(inputs) 时，您将获得您的输出。

这里是一个带有注意力的Tensorflow自定义解码器的例子

class BahdanauAttention(tf.keras.layers.Layer):
  def __init__(self, units):
    super(BahdanauAttention, self).__init__()
    self.W1 = tf.keras.layers.Dense(units)
    self.W2 = tf.keras.layers.Dense(units)
    self.V = tf.keras.layers.Dense(1)

  def call(self, query, values):
    # query hidden state shape == (batch_size, hidden size)
    # query_with_time_axis shape == (batch_size, 1, hidden size)
    # values shape == (batch_size, max_len, hidden size)
    # we are doing this to broadcast addition along the time axis to calculate the score
    query_with_time_axis = tf.expand_dims(query, 1)

    # score shape == (batch_size, max_length, 1)
    # we get 1 at the last axis because we are applying score to self.V
    # the shape of the tensor before applying self.V is (batch_size, max_length, units)
    score = self.V(tf.nn.tanh(
        self.W1(query_with_time_axis) + self.W2(values)))

    # attention_weights shape == (batch_size, max_length, 1)
    attention_weights = tf.nn.softmax(score, axis=1)

    # context_vector shape after sum == (batch_size, hidden_size)
    context_vector = attention_weights * values
    context_vector = tf.reduce_sum(context_vector, axis=1)

    return context_vector, attention_weights

class Decoder(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):
    super(Decoder, self).__init__()
    self.batch_sz = batch_sz
    self.dec_units = dec_units
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    self.gru = tf.keras.layers.GRU(self.dec_units,
                                   return_sequences=True,
                                   return_state=True,
                                   recurrent_initializer='glorot_uniform')
    self.fc = tf.keras.layers.Dense(vocab_size)

    # used for attention
    self.attention = BahdanauAttention(self.dec_units)

  def call(self, x, hidden, enc_output):
    # enc_output shape == (batch_size, max_length, hidden_size)
    context_vector, attention_weights = self.attention(hidden, enc_output)

    # x shape after passing through embedding == (batch_size, 1, embedding_dim)
    x = self.embedding(x)

    # x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
    x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)

    # passing the concatenated vector to the GRU
    output, state = self.gru(x)

    # output shape == (batch_size * 1, hidden_size)
    output = tf.reshape(output, (-1, output.shape[2]))

    # output shape == (batch_size, vocab)
    x = self.fc(output)

    return x, state, attention_weights

我尝试将 tf.keras.layers.Layer 对象放入 call 并得到非常糟糕的结果，猜想这是因为如果您将其放入 call 那么它将被调用多次，而每次发生前向传播时.

【讨论】：