RAG for long context LLMs

This is a talk that @rlancemartin gave at a few recent meetups on RAG in the era of long context LLMs. With context windows growing to 1M tokens, there have been many questions about whether RAG is “dead.“ We pull together threads from a few different recent projects to take a stab at addressing this. We review some current limitations with long context LLM fact reasoning & retrieval (using multi-needle in a haystack analysis), but also discuss some likely shifts in the RAG landscape due to expanding context windows (approaches for doc-centric indexing and RAG “flow engineering“). Slides: Highlighted references: 1/ Multi-needle analysis w/ @GregKamradt 2/ RAPTOR (@parthsarthi03 et al) 3/ Dense-X / multi-representation indexing (@tomchen0 et al) 4/ Long context embeddings (@JonSaadFalcon, @realDanFu, @simran_s_arora) 5/ Self-RAG (@AkariAsai et al), C-RAG (Shi-Qi Yan et al) (edited) Timepoints: 0:20 - Context windows are getting longer 2:10 - Multi-needle in a haystack 9:30 - How might RAG change? 12:00 - Query analysis 13:07 - Document-centric indexing 16:23 - Self-reflective RAG 19:40 - Summary

1 view

359