Retrieval Strategies for RAG Systems
By Marcin Piekarski builtweb.com.au · Last Updated: 11 February 2026
TL;DR: RAG systems retrieve relevant context before generating responses. Learn retrieval strategies, ranking, and optimization techniques.
TL;DR
RAG (Retrieval-Augmented Generation) finds relevant documents before answering questions. Key strategies: semantic search with embeddings, reranking top results, and optimizing chunk size.
What is RAG?
Definition:
Combining retrieval (search) with generation (LLMs).
Flow:
- User asks question
- Search relevant documents
- Add documents to LLM prompt
- LLM generates answer using context
Benefits:
- Answers based on your data
- No fine-tuning needed
- Easy to update knowledge
Retrieval methods
Semantic search:
- Convert query and documents to embeddings
- Find closest matches (cosine similarity)
- Captures meaning, not just keywords
Keyword search:
- Traditional full-text search
- BM25 algorithm
- Good for exact matches
Hybrid:
- Combine semantic + keyword
- Best of both worlds
- Weighted fusion
Chunking strategies
Fixed size:
- Split at N characters/tokens
- Simple but may break sentences
Sentence-based:
- Keep sentences intact
- More coherent chunks
Paragraph-based:
- Split by paragraphs
- Better semantic units
Sliding window:
- Overlapping chunks
- Ensures no context loss at boundaries
Optimal chunk size:
- Too small: Lacks context
- Too large: Dilutes relevance
- Sweet spot: 200-500 tokens
Reranking
Problem:
- First-pass retrieval may miss nuances
- Top results not always best
Solution:
- Retrieve top 20-50 candidates
- Use more sophisticated model to rerank
- Return top 3-5 to LLM
Reranking models:
- Cross-encoders (more accurate, slower)
- Cohere rerank
- Custom scoring functions
Query optimization
Query expansion:
- Generate multiple versions of query
- "Fix leaky faucet" → "repair dripping tap," "plumbing leak"
HyDE (Hypothetical Document Embeddings):
- Generate hypothetical answer
- Search for docs similar to that answer
- Often finds better matches
Query decomposition:
- Break complex queries into sub-questions
- Retrieve for each
- Combine results
Metadata filtering
- Filter by date, category, author
- Combine with semantic search
- "Recent product docs" + semantic match
Evaluation
Metrics:
- Recall: % of relevant docs retrieved
- Precision: % of retrieved docs relevant
- MRR (Mean Reciprocal Rank): Position of first relevant result
- Build test set of queries + expected docs
- Measure retrieval quality
- Iterate on strategy
Common issues
Poor retrieval:
Context overload:
- Too many docs in prompt
- Exceeds context window
- Solution: Better reranking, fewer but better docs
Recency bias:
- Newer docs not indexed yet
- Solution: Regular re-indexing
Best practices
- Experiment with chunk sizes
- Use hybrid search when possible
- Always rerank top results
- Add metadata for filtering
- Monitor and iterate
What's next
- Vector Databases
- Embeddings Explained
- Building RAG Applications
Frequently Asked Questions
What is the best retrieval method for RAG systems?
Hybrid search, which combines semantic (embedding-based) and keyword (BM25) search, consistently outperforms either method alone. Semantic search captures meaning and synonyms while keyword search handles exact terms, product names, and acronyms. Use weighted fusion to combine results.
How do I choose the right chunk size for my documents?
The sweet spot is typically 200-500 tokens. Smaller chunks are more precise but may lack context. Larger chunks provide more context but dilute relevance. Experiment with your specific documents and queries. Use overlapping windows (sliding window strategy) to avoid losing information at chunk boundaries.
What is reranking and when should I use it?
Reranking is a second pass that uses a more sophisticated model to re-score and reorder your initial retrieval results. Retrieve 20-50 candidates with fast search, then rerank to find the best 3-5 for the LLM. Use reranking when answer accuracy is critical, as it significantly improves precision.
How do I measure if my RAG retrieval is working well?
Build a test set of 50-100 questions with known correct source chunks. Measure recall (did you retrieve the right chunks), precision (are retrieved chunks relevant), and MRR (how high do relevant results rank). Iterate on chunking, search strategy, and reranking based on these metrics.
Was this guide helpful?
Your feedback helps us improve our guides
About the Authors
Marcin Piekarski· Frontend Lead & AI Educator
Marcin is a Frontend Lead with 20+ years in tech. Currently building headless ecommerce at Harvey Norman (Next.js, Node.js, GraphQL). He created Field Guide to AI to help others understand AI tools practically—without the jargon.
Credentials & Experience:
- 20+ years web development experience
- Frontend Lead at Harvey Norman (10 years)
- Worked with: Gumtree, CommBank, Woolworths, Optus, M&C Saatchi
- Runs AI workshops for teams
- Founder of builtweb.com.au
- Daily AI tools user: ChatGPT, Claude, Gemini, AI coding assistants
- Specializes in React ecosystem: React, Next.js, Node.js
Areas of Expertise:
Prism AI· AI Research & Writing Assistant
Prism AI is the AI ghostwriter behind Field Guide to AI—a collaborative ensemble of frontier models (Claude, ChatGPT, Gemini, and others) that assist with research, drafting, and content synthesis. Like light through a prism, human expertise is refracted through multiple AI perspectives to create clear, comprehensive guides. All AI-generated content is reviewed, fact-checked, and refined by Marcin before publication.
Transparency Note: All AI-assisted content is thoroughly reviewed, fact-checked, and refined by Marcin Piekarski before publication.
Key Terms Used in This Guide
RAG (Retrieval-Augmented Generation)
A technique where AI searches your documents for relevant information first, then uses what it finds to generate accurate, grounded answers.
Context Window
The maximum amount of text an AI model can process at once—including both what you send and what it generates. Once the window fills up, the AI loses access to earlier parts of the conversation.
Beam Search
A text generation strategy where the AI explores multiple possible word sequences simultaneously and keeps the best few at each step, resulting in higher-quality but slower output than greedy generation.
Related Guides
Semantic Search: Search by Meaning, Not Keywords
IntermediateSemantic search finds results based on meaning, not exact keyword matches. Learn how it works and how to implement it.
6 min readFine-Tuning Fundamentals: Customizing AI Models
IntermediateFine-tuning adapts pre-trained models to your specific use case. Learn when to fine-tune, how it works, and alternatives.
8 min readVector Database Fundamentals
IntermediateVector databases store and search embeddings efficiently. Learn how they work, when to use them, and popular options.
7 min read