Efficient Fine-Tuning for Llama-v2-7b on a Single GPU

The first problem you’re likely to encounter when fine-tuning an LLM is the “host out of memory” error. It’s more difficult for fine-tuning the 7B parameter Llama-2 model which requires more memory. In this talk, we are having Piero Molino and Travis Addair from the open-source Ludwig project to show you how to tackle this problem. The good news is that, with an optimized LLM training framework like , you can get the host memory overhead back down to a more reasonable host memory even when training on multiple GPUs. In this hands-on workshop, we‘ll discuss the unique challenges in finetuning LLMs and show you how you can tackle these challenges with open-source tools through a demo. By the end of this session, attendees will understand: - How to fine-tune LLMs like Llama-2-7b on a single GPU - Techniques like parameter efficient tuning and quantization, and how they can help - How to train a 7b param model on a single T4 GPU (QLoRA) - How to deploy tuned models l

1 view

601

253