Tensorflow - 机器翻译解码器答案

【问题标题】：Tensorflow - Decoder for Machine TranslationTensorflow - 机器翻译解码器
【发布时间】：2021-05-18 04:16:27
【问题描述】：

我正在通过Tensorflow's tutorial 使用注意力机制进行神经机器翻译。

解码器的代码如下：

class Decoder(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):
    super(Decoder, self).__init__()
    self.batch_sz = batch_sz
    self.dec_units = dec_units
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    self.gru = tf.keras.layers.GRU(self.dec_units,
                                   return_sequences=True,
                                   return_state=True,
                                   recurrent_initializer='glorot_uniform')
    self.fc = tf.keras.layers.Dense(vocab_size)

    # used for attention
    self.attention = BahdanauAttention(self.dec_units)

  def call(self, x, hidden, enc_output):
    # enc_output shape == (batch_size, max_length, hidden_size)
    context_vector, attention_weights = self.attention(hidden, enc_output)

    # x shape after passing through embedding == (batch_size, 1, embedding_dim)
    x = self.embedding(x)

    # x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
    x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)

    # passing the concatenated vector to the GRU
    output, state = self.gru(x)

    # output shape == (batch_size * 1, hidden_size)
    output = tf.reshape(output, (-1, output.shape[2]))

    # output shape == (batch_size, vocab)
    x = self.fc(output)

    return x, state, attention_weights

这里我不明白的是，解码器的 GRU 单元没有通过使用编码器的最后一个隐藏状态初始化它来连接到编码器。

output, state = self.gru(x)  

# Why is it not initialized with the hidden state of the encoder ?

根据我的理解，编码器和解码器之间是有联系的，只有当解码器用“思想向量”或编码器的最后一个隐藏状态初始化时。

为什么在 Tensorflow 的官方教程中没有这个？它是一个错误吗？还是我在这里遗漏了什么？

有人可以帮我理解吗？

【问题讨论】：

标签： tensorflow lstm machine-translation encoder-decoder gated-recurrent-unit

【解决方案1】：

detailed NMT guide 很好地总结了这一点，它将经典的 seq2seq NMT 与基于编码器-解码器注意力的 NMT 架构进行了比较。

Vanilla seq2seq：解码器还需要访问源信息，实现这一点的一种简单方法是使用编码器的最后一个隐藏状态encoder_state对其进行初始化。

基于注意力的编码器-解码器：请记住，在 vanilla seq2seq 模型中，我们在开始解码过程时将最后一个源状态从编码器传递给解码器。这适用于短句和中长句；然而，对于长句子，单个固定大小的隐藏状态成为信息瓶颈。注意力机制不是丢弃在源 RNN 中计算的所有隐藏状态，而是提供了一种允许解码器窥视它们的方法（将它们视为源信息的动态记忆）。通过这样做，注意力机制改善了长句的翻译。

在这两种情况下，您都可以使用 teacher forcing 来更好地训练模型。

TLDR;注意力机制是帮助解码器“达到”编码器“峰值”的原因，而不是您将编码器正在执行的操作显式传递给解码器。

【讨论】：