RAG (Retrieval-Augmented Generation)
Also known as: Retrieval-Augmented Generation, RAG
In one sentence
A technique where AI searches your documents for relevant information first, then uses what it finds to generate accurate, grounded answers.
Explain like I'm 12
Instead of the AI guessing from memory, it looks up the answer in your notes first, then writes a response based on what it actually found — like an open-book exam instead of a closed-book one.
In context
RAG powers customer support chatbots that search company knowledge bases, research assistants that pull from internal documents, and enterprise tools that need to cite specific policies. A typical RAG pipeline converts documents into embeddings, stores them in a vector database, retrieves the most relevant chunks when a user asks a question, and feeds those chunks to an LLM as context. This keeps answers accurate and up to date without retraining the model.
See also
Related Guides
Learn more about RAG (Retrieval-Augmented Generation) in these guides:
Fine-Tuning vs RAG: Which Should You Use?
IntermediateCompare fine-tuning and RAG to customize AI. Learn when each approach works best, how they differ, and how to combine them.
12 min readEmbeddings & RAG Explained (Plain English)
IntermediateHow AI tools search and retrieve information from documents. Understand embeddings and Retrieval-Augmented Generation without the math.
11 min readAdvanced RAG Techniques
AdvancedGo beyond basic RAG: hybrid search, reranking, query expansion, HyDE, and multi-hop retrieval for better context quality.
9 min read