🎉 Congratulations! You've completed the Intermediate Path!
Vector Database Fundamentals
Vector databases store and search embeddings efficiently. Learn how they work, when to use them, and popular options.
TL;DR
Vector databases store embeddings and enable fast similarity search. Essential for RAG, recommendations, and semantic search at scale.
What is a vector database?
Definition:
A database optimized for storing and searching high-dimensional vectors (embeddings).
Why not regular databases?
- Regular DBs: Exact match, keyword search
- Vector DBs: Similarity search, semantic matching
- Vector DBs: Optimized for high-dimensional data
How they work
Index creation:
- Generate embeddings for documents
- Store vectors with metadata
- Build efficient index (HNSW, IVF, etc.)
Search:
- Convert query to embedding
- Find k nearest neighbors
- Return similar items
Vector similarity search
Nearest neighbor search:
- Find items closest to query vector
- Measure: cosine similarity, dot product, L2 distance
Example:
- Query: "How to fix a leak?"
- Returns: Documents about plumbing repairs
- Even if exact words don't match
Popular vector databases
Pinecone:
- Managed service
- Easy to use
- Auto-scaling
- Paid
Weaviate:
- Open source or managed
- Multi-modal support
- Hybrid search
- Self-host or cloud
Qdrant:
- Open source
- Rust-based (fast)
- Good filtering
- Self-host or cloud
Chroma:
- Lightweight
- Great for prototyping
- Open source
- Embedded mode
Milvus:
- Open source
- Highly scalable
- Enterprise features
Pgvector (Postgres extension):
- Add vectors to existing Postgres
- Familiar SQL interface
- Good for small-medium scale
Key features
Metadata filtering:
- Search within filtered subset
- "Find similar docs from 2024"
Hybrid search:
- Combine vector + keyword search
- Best of both worlds
Multi-tenancy:
- Isolate data by user/org
- Important for SaaS
Scalability:
- Millions-billions of vectors
- Distributed architecture
When to use vector databases
Good for:
- RAG systems (document retrieval)
- Semantic search
- Recommendation engines
- Duplicate detection
- Anomaly detection
Overkill for:
- Small datasets (< 10K items)
- Exact match search
- Simple keyword search
Implementation example
import pinecone
from openai import OpenAI
# Initialize
pinecone.init(api_key="...")
index = pinecone.Index("my-index")
openai_client = OpenAI()
# Index document
text = "How to reset password..."
embedding = openai_client.embeddings.create(
input=text,
model="text-embedding-3-small"
).data[0].embedding
index.upsert([("doc1", embedding, {"text": text})])
# Search
query = "forgot my password"
query_emb = openai_client.embeddings.create(
input=query,
model="text-embedding-3-small"
).data[0].embedding
results = index.query(vector=query_emb, top_k=3)
Performance optimization
Indexing algorithms:
- HNSW: Fast, accurate (most popular)
- IVF: Good for large datasets
- Product Quantization: Compress vectors
Trade-offs:
- Accuracy vs speed
- Memory vs disk
- Indexing time vs query time
Cost considerations
Managed services:
- Pay per vector stored
- Pay per query
- $50-500+/month typical
Self-hosted:
- Server costs
- Maintenance effort
- More control
Best practices
- Choose embedding model carefully
- Experiment with index parameters
- Use metadata filtering
- Monitor query performance
- Implement caching for common queries
What's next
- Embeddings Explained
- RAG Systems
- Semantic Search
Was this guide helpful?
Your feedback helps us improve our guides
Key Terms Used in This Guide
Embedding
A list of numbers that represents the meaning of text. Similar meanings have similar numbers, so computers can compare by 'closeness'.
Embeddings
Collections of numerical representations that capture meaning. When you have multiple embeddings, you can compare them to find similar content, power search systems, and enable AI to understand relationships between concepts.
Vector Database
A database optimized for storing and searching embeddings (number lists). Finds similar items by comparing their vectors.
Related Guides
Fine-Tuning Fundamentals: Customizing AI Models
IntermediateFine-tuning adapts pre-trained models to your specific use case. Learn when to fine-tune, how it works, and alternatives.
Retrieval Strategies for RAG Systems
IntermediateRAG systems retrieve relevant context before generating responses. Learn retrieval strategies, ranking, and optimization techniques.
Semantic Search: Search by Meaning, Not Keywords
IntermediateSemantic search finds results based on meaning, not exact keyword matches. Learn how it works and how to implement it.