We reproduce the GPT-2 (124M) from scratch. This video covers the whole process: First we build the GPT-2 network, then we optimize its training to be really fast, then we set up the training run following the GPT-2 and GPT-3 paper and their hyperparameters, then we hit run, and come back the next morning to see our results, and enjoy some amusing model generations. Keep in mind that in some places this video builds on the knowledge from earlier videos in the Zero to Hero Playlist (see my channel). You could also see this video as building my nanoGPT repo, which by the end is about 90% similar.
Links:
- build-nanogpt GitHub repo, with all the changes in this video as individual commits:
- nanoGPT repo:
- llm.c repo:
- my website:
- my twitter:
- our Discord channel:
Supplementary links:
8 views
1362
419
9 months ago 10:00:00 0
The Shelter Where Hope Grew. Post-Apocalyptic Ambiance for Sleep, Study, Relaxation
9 months ago 00:34:00 0
Italian Street Fashion 2024 | Chic & Trendy Milan’s Autumn Fashion | Sidewalk Milan Street Style
9 months ago 00:11:08 0
English Speaking Practice Session - Swimming Class - Advanced English Learners Guidelines
9 months ago 00:33:44 0
[4K]🇺🇸NYC Walk🗽Spooky Friday in New York City🍂👻🎃Beautiful Autumn Day in Manhattan | Oct 2024
9 months ago 00:04:54 0
Coldplay - MOON MUSiC (Full Cover by Rece Wissner)
9 months ago 00:54:12 0
[4K]🇺🇸NYC Walk🗽Beautiful Autumn Day on Upper West Side🍂🍁Fall Foliage in Central Park | Oct 2024