ERRATA:
- In slide 23, the indices are incorrect. The index of the key and value should match (j) and theindex of the query should be different (i).
- In slide 25, the diagram illustrating how multi-head self-attention is computed is a slight departure from how it’s usually done (the implementation in the subsequent slide is correct, but these are not quite functionally equivalent). See the slides PDF below for an updates diagram.
In this video, we discuss the self-attention mechanism. A very simple and powerful sequence-to-sequence layer that is at the heart of transformer architectures.
slides:
course website:
Lecturer: Peter Bloem
0 views
399
103
10 years ago 00:26:26 1
Lecture 12 Module 1
4 years ago 01:11:54 5
Deep Learning - Lecture 12.1
10 years ago 01:47:43 134
SEST Level 1 Lecture 1
6 years ago 01:19:17 0
Lecture 12 | (1/3) Recurrent Neural Networks
2 years ago 00:22:30 0
Lecture 12.1 Self-attention
9 years ago 03:00:15 7
Calculus 1 Lecture 1.2: Properties of Limits. Techniques of Limit Computation