Expire-Span: Not All Memories are Created Equal: Learning to Forget by Expiring (Paper Explained)

#expirespan #nlp #facebookai Facebook AI (FAIR) researchers present Expire-Span, a variant of Transformer XL that dynamically assigns expiration dates to previously encountered signals. Because of this, Expire-Span can handle sequences of many thousand tokens, while keeping the memory and compute requirements at a manageable level. It severely matches or outperforms baseline systems, while consuming much less resources. We discuss its architecture, advantages, and shortcomings. OUTLINE: 0:00 - Intro & Overview 2:30 - Remembering the past in sequence models 5:45 - Learning to expire past memories 8:30 - Difference to local attention 10:00 - Architecture overview 13:45 - Comparison to Transformer XL 18:50 - Predicting expiration masks 32:30 - Experimental Results 40:00 - Conclusion & Comments Paper: Code: ADDENDUM: I mention several times that the gradient signal of the e quantity only occurs inside the R ramp. By th
Back to Top