TL;DR

Advanced RAG uses hybrid search (semantic + keyword), reranking, query expansion, HyDE (hypothetical answers), and multi-hop retrieval to improve context quality and answer accuracy.

Combine semantic (embeddings) and keyword (BM25) search:

  • Semantic: Captures meaning, handles synonyms
  • Keyword: Precise matches, acronyms
  • Fusion: Weighted combination or reciprocal rank fusion (RRF)

Reranking

Two-stage retrieval:

  1. Fast retrieval: Get top 50-100 candidates
  2. Slow reranking: Use cross-encoder for top 3-5

Reranking models: Cohere rerank, cross-encoders, custom scoring

Query expansion

Techniques:

  • Multi-query: Generate variations, retrieve for each
  • HyDE: Generate hypothetical answer, search for similar docs
  • Decomposition: Break complex queries into sub-queries

Contextual compression

Extract only relevant parts of retrieved docs:

  • LLM-based extraction
  • Reduce noise
  • Fit more relevant context in window

Multi-hop retrieval

For complex questions requiring multiple documents:

  1. Retrieve based on question
  2. Generate follow-up query from first results
  3. Retrieve additional context
  4. Combine and answer

Metadata filtering

Pre-filter before semantic search:

  • Date ranges
  • Categories
  • User permissions
  • Custom attributes

Evaluation metrics

  • Retrieval accuracy (Recall@K, MRR)
  • Answer quality (human eval, LLM-as-judge)
  • Latency
  • Cost per query

What's next

  • RAG Retrieval Strategies
  • Vector Databases
  • Production RAG Systems