Advanced RAG Techniques
Go beyond basic RAG: hybrid search, reranking, query expansion, HyDE, and multi-hop retrieval for better context quality.
TL;DR
Advanced RAG uses hybrid search (semantic + keyword), reranking, query expansion, HyDE (hypothetical answers), and multi-hop retrieval to improve context quality and answer accuracy.
Hybrid search
Combine semantic (embeddings) and keyword (BM25) search:
- Semantic: Captures meaning, handles synonyms
- Keyword: Precise matches, acronyms
- Fusion: Weighted combination or reciprocal rank fusion (RRF)
Reranking
Two-stage retrieval:
- Fast retrieval: Get top 50-100 candidates
- Slow reranking: Use cross-encoder for top 3-5
Reranking models: Cohere rerank, cross-encoders, custom scoring
Query expansion
Techniques:
- Multi-query: Generate variations, retrieve for each
- HyDE: Generate hypothetical answer, search for similar docs
- Decomposition: Break complex queries into sub-queries
Contextual compression
Extract only relevant parts of retrieved docs:
Multi-hop retrieval
For complex questions requiring multiple documents:
- Retrieve based on question
- Generate follow-up query from first results
- Retrieve additional context
- Combine and answer
Metadata filtering
Pre-filter before semantic search:
- Date ranges
- Categories
- User permissions
- Custom attributes
Evaluation metrics
What's next
Was this guide helpful?
Your feedback helps us improve our guides
Key Terms Used in This Guide
Related Guides
Model Compression: Smaller, Faster AI
AdvancedCompress AI models with quantization, pruning, and distillation. Deploy faster, cheaper models without sacrificing much accuracy.
Quantization and Distillation Deep Dive
AdvancedMaster advanced model compression: quantization-aware training, mixed precision, and distillation strategies for production deployment.
Training Custom Embedding Models
AdvancedFine-tune or train embedding models for your domain. Improve retrieval quality with domain-specific embeddings.