CS25 I Stanford Seminar - Transformers in Language: The development of GPT Models including GPT3

While the Transformer architecture is used in a variety of applications across a number of domains, it first found success in natural language. Today, Transformers remain the de facto model in language - they achieve state-of-the-art results on most natural language benchmarks, and can generate text coherent enough to deceive human readers. In this talk, we will review recent progress in neural language modeling, discuss the link between generating text and solving downstream tasks, and explore how this led to the development of GPT models at OpenAI. Next, we’ll see how the same approach can be used to produce generative models and strong representations in other domains like images, text-to-image, and code. Finally, we will dive into the recently released code generating model, Codex, and examine this particularly interesting domain of study. Mark Chen is a research scientist at OpenAI, where he manages the Algorithms Team. His research interests include generative modeling and representation learning, especially in the image and multimodal domains. Prior to OpenAI, Mark worked in high frequency trading and graduated from MIT. Mark is also a coach for the USA Computing Olympiad team. A full list of guest lectures can be found here:

3 views