https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/layers/common_attention.py
论文:
Self-Attention with Relative Position Representations
MUSIC TRANSFORMER: GENERATING MUSIC WITH LONG-TERM STRUCTURE
TRANSFORMER-XL: ATTENTIVE LANGUAGE MODELS BEYOND A FIXED-LENGTH CONTEXT