ALiBi - Train Short, Test Long: Attention with linear biases enables input length extrapolation
#alibi #transformers #attention
Transformers are essentially set models that need additional inputs to make sense of sequence data. The most widespread additional inputs are position encodings or position embeddings, which add sequence index information in various forms. However, this has put a limit on the resulting model, which cannot run inference on sequences longer than it has been trained on, as it would encounter unfamiliar position encodings. ALiBi solves this by proposing simple linear fixed biases as position information, adding negligible overhead in time and memory, but surprisingly, the resulting model is able to handle inference on sequences many times as long as its training sequences.
OUTLINE:
0:00 - Intro & Overview
1:40 - Position Encodings in Transformers
4:55 - Sinusoidial Position Encodings
11:50 - ALiBi Position Encodings
20:50 - How to choose the slope parameter
23:55 - Experimental Results
29:10 - Comments & Conclusion
Paper: https://of