Seminar #3. When BERT plays the lottery, all tickets are winning!

In the talk, Anna Rogers presented the paper: When BERT plays the lottery, all tickets are winning! The lottery ticket hypothesis was originally developed for randomly initialized models, but might it also apply to pre-trained Transformers? If the “good” subnetworks exist, can they tell us anything about how BERT achieves its performance? Full article:
Back to Top