【问题标题】:Unable to understand the behavior of method `build` in tensorflow keras layers (tf.keras.layers.Layer)无法理解 tensorflow keras 层(tf.keras.layers.Layer)中“build”方法的行为
【发布时间】:2020-09-01 05:51:22
【问题描述】:

tensorflow keras 中的层有一个方法build,用于将权重创建推迟到您已经看到输入内容的时间。 a layer's build method

我有几个问题无法找到答案:

  1. here据说

    如果您将一个 Layer 实例指定为另一个 Layer 的属性,则外层将开始跟踪内层的权重。

跟踪层的权重是什么意思?

  1. 同一个链接还提到

    我们建议在 init 方法中创建此类子层(因为子层通常具有构建方法,它们将在构建外层时构建)。

这是否意味着在运行子类(self)的build 方法时,将遍历self 的所有属性,并且发现从tf.keras.layer.Layer 的(实例)子类化的任何一个都将具有他们的build 方法会自动运行吗?

  1. 我可以运行这段代码:
class Net(tf.keras.Model):
  """A simple linear model."""

  def __init__(self):
    super(Net, self).__init__()
    self.l1 = tf.keras.layers.Dense(5)
  def call(self, x):
    return self.l1(x)

net = Net()
print(net.variables)

但不是这个:

class Net(tf.keras.Model):
  """A simple linear model."""

  def __init__(self):
    super(Net, self).__init__()
    self.l1 = tf.keras.layers.Dense(5)
  def build(self,input_shape):
    super().build()
  def call(self, x):
    return self.l1(x)

net = Net()
print(net.variables)

为什么?

【问题讨论】:

    标签: tensorflow keras tensorflow2.0 keras-layer tf.keras


    【解决方案1】:

    我会说 build 提到的意思是,例如,当您构建一个自定义的 tf.keras.Model 时

    net = Net()
    

    然后您将获得在__init__ 中创建的所有tf.keras.layers.Layer 对象,并存储在net 中,这是一个可调用对象。在这种情况下,它将成为一个完整的对象,供 TF 稍后训练,这就是它所说的to track。下次您致电 net(inputs) 时,您将获得您的输出。

    这里是一个带有注意力的Tensorflow自定义解码器的例子

    class BahdanauAttention(tf.keras.layers.Layer):
      def __init__(self, units):
        super(BahdanauAttention, self).__init__()
        self.W1 = tf.keras.layers.Dense(units)
        self.W2 = tf.keras.layers.Dense(units)
        self.V = tf.keras.layers.Dense(1)
    
      def call(self, query, values):
        # query hidden state shape == (batch_size, hidden size)
        # query_with_time_axis shape == (batch_size, 1, hidden size)
        # values shape == (batch_size, max_len, hidden size)
        # we are doing this to broadcast addition along the time axis to calculate the score
        query_with_time_axis = tf.expand_dims(query, 1)
    
        # score shape == (batch_size, max_length, 1)
        # we get 1 at the last axis because we are applying score to self.V
        # the shape of the tensor before applying self.V is (batch_size, max_length, units)
        score = self.V(tf.nn.tanh(
            self.W1(query_with_time_axis) + self.W2(values)))
    
        # attention_weights shape == (batch_size, max_length, 1)
        attention_weights = tf.nn.softmax(score, axis=1)
    
        # context_vector shape after sum == (batch_size, hidden_size)
        context_vector = attention_weights * values
        context_vector = tf.reduce_sum(context_vector, axis=1)
    
        return context_vector, attention_weights
    
    class Decoder(tf.keras.Model):
      def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):
        super(Decoder, self).__init__()
        self.batch_sz = batch_sz
        self.dec_units = dec_units
        self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
        self.gru = tf.keras.layers.GRU(self.dec_units,
                                       return_sequences=True,
                                       return_state=True,
                                       recurrent_initializer='glorot_uniform')
        self.fc = tf.keras.layers.Dense(vocab_size)
    
        # used for attention
        self.attention = BahdanauAttention(self.dec_units)
    
      def call(self, x, hidden, enc_output):
        # enc_output shape == (batch_size, max_length, hidden_size)
        context_vector, attention_weights = self.attention(hidden, enc_output)
    
        # x shape after passing through embedding == (batch_size, 1, embedding_dim)
        x = self.embedding(x)
    
        # x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
        x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)
    
        # passing the concatenated vector to the GRU
        output, state = self.gru(x)
    
        # output shape == (batch_size * 1, hidden_size)
        output = tf.reshape(output, (-1, output.shape[2]))
    
        # output shape == (batch_size, vocab)
        x = self.fc(output)
    
        return x, state, attention_weights
    

    我尝试将 tf.keras.layers.Layer 对象放入 call 并得到非常糟糕的结果,猜想这是因为如果您将其放入 call 那么它将被调用多次,而每次发生前向传播时.

    【讨论】:

      猜你喜欢
      • 2020-06-16
      • 2020-12-02
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-12-26
      相关资源
      最近更新 更多