【问题标题】：Tensorflow Neural Machine Translation Example - Loss FunctionTensorflow 神经机器翻译示例 - 损失函数
【发布时间】：2021-03-09 17:42:22
【问题描述】：

我在这里单步执行代码：https://www.tensorflow.org/tutorials/text/nmt_with_attention 作为一种学习方法，我对何时调用损失函数以及传递了什么感到困惑。我在 loss_function 中添加了两个打印语句，当训练循环运行时，它只打印出来

(64,) (64, 4935)

一开始就多次，然后就没有了。我在两个方面感到困惑：

为什么 loss_function() 没有通过训练循环重复调用并打印形状？我预计损失函数会在每个大小为 64 的批次结束时被调用。
我预计实际的形状是（批量大小、时间步长）和预测的形状是（批量大小、时间步长、词汇量大小）。看起来损失在每个时间步都被单独调用（64 是批量大小，4935 是词汇量）。

我认为相关的部分转载如下。

    optimizer = tf.keras.optimizers.Adam()
    loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction='none')
    
    def loss_function(real, pred):
          mask = tf.math.logical_not(tf.math.equal(real, 0))
          
          print(real.shape)
          print(pred.shape)
    
    
          loss_ = loss_object(rea

l, pred) 
      mask = tf.cast(mask, dtype=loss_.dtype) 
      loss_ *= mask #set padding entries to zero loss
     
      return tf.reduce_mean(loss_)

    @tf.function
    def train_step(inp, targ, enc_hidden):
      loss = 0
    
      with tf.GradientTape() as tape:
        enc_output, enc_hidden = encoder(inp, enc_hidden)
    
        dec_hidden = enc_hidden
    
        dec_input = tf.expand_dims([targ_lang.word_index['<start>']] * BATCH_SIZE, 1)
    
        # Teacher forcing - feeding the target as the next input
        for t in range(1, targ.shape[1]):
          # passing enc_output to the decoder
          predictions, dec_hidden, _ = decoder(dec_input, dec_hidden, enc_output)
          print(targ[:, t])
          print(predictions)
          loss += loss_function(targ[:, t], predictions)
    
          # using teacher forcing
          dec_input = tf.expand_dims(targ[:, t], 1)
    
      batch_loss = (loss / int(targ.shape[1]))
    
      variables = encoder.trainable_variables + decoder.trainable_variables
    
      gradients = tape.gradient(loss, variables)
    
      optimizer.apply_gradients(zip(gradients, variables))
    
      return batch_loss


    EPOCHS = 10
    
    for epoch in range(EPOCHS):
      start = time.time()
    
      enc_hidden = encoder.initialize_hidden_state()
      total_loss = 0
    
      for (batch, (inp, targ)) in enumerate(dataset.take(steps_per_epoch)):
        #print(batch)    
        batch_loss = train_step(inp, targ, enc_hidden)
        total_loss += batch_loss
    
        if batch % 100 == 0:
          print('Epoch {} Batch {} Loss {:.4f}'.format(epoch + 1,
                                                       batch,
                                                       batch_loss.numpy()))
      # saving (checkpoint) the model every 2 epochs
      if (epoch + 1) % 2 == 0:
        checkpoint.save(file_prefix = checkpoint_prefix)
    
      print('Epoch {} Loss {:.4f}'.format(epoch + 1,
                                          total_loss / steps_per_epoch))
      print('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

【问题讨论】：

标签： tensorflow keras tensorflow2.0

【解决方案1】：

损失的处理与图表的其余部分类似。在 tensorflow 中，像 tf.keras.layers.Dense 和 tf.nn.conv2d 这样的调用实际上并不执行操作，而是定义了操作的图形。我在这里有另一篇帖子How do backpropagation works in tensorflow，解释了反向传播以及为什么会这样的一些动机。

你上面的损失函数是

def loss_function(real, pred):
      mask = tf.math.logical_not(tf.math.equal(real, 0))
      
      print(real.shape)
      print(pred.shape)


      loss_ = loss_object(real, pred)
      mask = tf.cast(mask, dtype=loss_.dtype) 
      loss_ *= mask #set padding entries to zero loss
 
      result = tf.reduce_mean(loss_)
      return result

将此函数视为返回结果的生成。结果定义了计算损失的图表。也许这个函数的一个更好的名字是loss_function_graph_creator ...但那是另一回事了。

Result，这是一个包含权重、偏差和有关如何进行前向传播和反向传播的信息的图表，这些都是 model.fit 所需要的。它不再需要这个函数，也不需要每次循环都运行这个函数。

确实，幕后发生的事情是给定您的模型（称为my_model），编译行

model.compile(loss=loss_function, optimizer='sgd')

实际上是以下几行

input = tf.keras.Input()
output = my_model(input)
loss = loss_function(input,output)
opt = tf.keras.optimizers.SGD()
gradient = opt.minimize(loss)

get_gradient_model = tf.keras.Model(input,gradient)

然后你就有了梯度操作，可以在循环中使用它来获取梯度，这在概念上就是 model.fit 所做的。

问与答

这个函数：@tf.function def train_step(inp, targ, enc_hidden): 有 tf.function decorator （并在其中调用损失函数）是什么让这段代码按照你的描述运行而不是正常的 python？

没有。它不是“正常”的蟒蛇。它仅通过将（希望）在您的 GPU 上运行的矩阵运算图来定义张量流。所有的 tensorflow 操作只是在 GPU 上设置操作（或者如果你没有模拟 GPU）。

如何判断传递给 loss_function 的实际形状（我的问题的第二部分）？

完全没问题...只需运行这段代码

loss_function(y, y).shape

这将计算您的预期输出与完全相同的输出相比的损失函数。损失将（希望）为零，但实际计算损失的价值并不是重点。你想要这个形状，这会给你。

【讨论】：

是这个函数的事实：@tf.function def train_step(inp, targ, enc_hidden): 有 tf.function 装饰器（并且在其中调用了损失函数）是什么让这段代码运行正如你所描述的而不是普通的python？
我怎样才能知道传递给 loss_function 的实际形状（我的问题的第二部分）？