动机

RNN的结构是通过隐状态对序列信息进行编码的。

论文:Self-Attention with Relative Position Representations

第二个 I 的输出和第一个 I 的输出是不同的,这是因为输入到其中的隐状态是不同的。对于第二个 I 来说,隐状态经过了单词"I think therefore",而第一个 I 是刚刚经过初始化的。因此,RNN的隐状态会使得处于不同位置的相同词具有不同的输出表示。恰恰相反的是,具有自注意力机制的Transformer(没有位置编码的)会使得不同位置的相同词具有相同的输出表示。
论文:Self-Attention with Relative Position Representations

上图表示的是输入序列为"I think therefore I am",然后传送到Transformer的的结果。

 

Self-Attention
 

论文:Self-Attention with Relative Position Representations

其中,论文:Self-Attention with Relative Position Representations,  论文:Self-Attention with Relative Position Representations论文:Self-Attention with Relative Position Representations, 论文:Self-Attention with Relative Position Representations, h为注意力头数

 

Relation-aware Self-Attention

在普通self-attention的基础上,这篇文章引入了两个与相对位置相关的向量:论文:Self-Attention with Relative Position Representations,  论文:Self-Attention with Relative Position Representations

如果attention的目标词是论文:Self-Attention with Relative Position Representations的话,那么在计算论文:Self-Attention with Relative Position Representations论文:Self-Attention with Relative Position Representations的注意力特征的时候,需要额外考虑论文:Self-Attention with Relative Position Representations论文:Self-Attention with Relative Position Representations的两个与位置相关的向量。
引入这两个向量之后,上述self-attention的计算可以修改为:

论文:Self-Attention with Relative Position Representations

 

Relative Position Representations

Relative Position Representations的目标是给出论文:Self-Attention with Relative Position Representations, 论文:Self-Attention with Relative Position Representations的计算方式。作者假设如果序列中两个元素的距离超过论文:Self-Attention with Relative Position Representations,则这两元素之间的位置信息就没有意义了。同时,论文:Self-Attention with Relative Position Representations, 论文:Self-Attention with Relative Position Representations应该只跟相对位置有关,而与论文:Self-Attention with Relative Position Representations,论文:Self-Attention with Relative Position Representations没有关系。作者直接将论文:Self-Attention with Relative Position Representations, 论文:Self-Attention with Relative Position Representations定义为了可训练的向量,本质上是训练论文:Self-Attention with Relative Position Representations论文:Self-Attention with Relative Position Representations

论文:Self-Attention with Relative Position Representations

相关文章:

  • 2021-11-03
  • 2022-12-23
  • 2021-04-26
  • 2021-11-13
  • 2021-10-03
  • 2021-10-16
  • 2021-05-14
  • 2021-10-11
猜你喜欢
  • 2021-05-07
  • 2021-11-10
  • 2022-12-23
  • 2021-11-02
  • 2021-05-11
  • 2021-06-06
  • 2021-12-05
相关资源
相似解决方案