Attention Is All You Need - Paper Explained

In this video, I’ll try to present a comprehensive study on Ashish Vaswani and his coauthors’ renowned paper, “attention is all you need” This paper is a major turning point in deep learning research. The transformer architecture, which was introduced in this paper, is now used in a variety of state-of-the-art models in natural language processing and beyond. 📑 Chapters: 0:00 Abstract 0:39 Introduction 2:44 Model Details 3:20 Encoder 3:30 Input Embedding 5:22 Positional Encoding 11:05 Self-Attention 15:38 Multi-Head Attention 17:31 Add and Layer Normalization 20:38 Feed Forward NN 23:40 Decoder 23:44 Decoder in Training and Testing Phase 27:31 Masked Multi-Head Attention 30:03 Encoder-decoder Self-Attention 33:19 Results 35:37 Conclusion 📝 Link to the paper: 👥 Authors: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kais
Back to Top