Transformer
Variants
Title | Year | Author | Link | Memo |
---|---|---|---|---|
Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View | 2019 | Yiping Lu1 et al. | ffn + attention + ffn | |
LITE TRANSFORMER WITH LONG-SHORT RANGE ATTENTION | ICLR 2020 | Zhanghao Wu et al. | lightweight conv (short distance) + self-attention (long distance) |