Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

A complete explanation of all the layers of a Transformer Model: Multi-Head Self-Attention, Positional Encoding, including all the matrix multiplications and a complete description of the training and inference process. Slides PDF: Chapters 00:00 - Intro 01:10 - RNN and their problems 08:04 - Transformer Model 09:02 - Maths background and notations 12:20 - Encoder (overview) 12:31 - Input Embeddings 15:04 - Positional Encoding 20:08 - Single Head Self-Attention 28:30 - Multi-Head Attention 35:39 - Query, Key, Value 37:55 - Layer Normalization 40:13 - Decoder (overview) 42:24 - Masked Multi-Head Attention 44:59 - Training 52:09 - Inference
Back to Top