Vector Database Fundamentals
Vector databases store and search embeddings efficiently. Learn how they work, when to use them, and popular options.
TL;DR
Vector databases store embeddings and enable fast similarity search. Essential for RAG, recommendations, and semantic search at scale.
What is a vector database?
Definition:
A database optimized for storing and searching high-dimensional vectors (embeddings).
Why not regular databases?
- Regular DBs: Exact match, keyword search
- Vector DBs: Similarity search, semantic matching
- Vector DBs: Optimized for high-dimensional data
How they work
Index creation:
- Generate embeddings for documents
- Store vectors with metadata
- Build efficient index (HNSW, IVF, etc.)
Search:
- Convert query to embedding
- Find k nearest neighbors
- Return similar items
Vector similarity search
Nearest neighbor search:
- Find items closest to query vector
- Measure: cosine similarity, dot product, L2 distance
Example:
- Query: "How to fix a leak?"
- Returns: Documents about plumbing repairs
- Even if exact words don't match
Popular vector databases
Pinecone:
- Managed service
- Easy to use
- Auto-scaling
- Paid
Weaviate:
- Open source or managed
- Multi-modal support
- Hybrid search
- Self-host or cloud
Qdrant:
- Open source
- Rust-based (fast)
- Good filtering
- Self-host or cloud
Chroma:
- Lightweight
- Great for prototyping
- Open source
- Embedded mode
Milvus:
- Open source
- Highly scalable
- Enterprise features
Pgvector (Postgres extension):
- Add vectors to existing Postgres
- Familiar SQL interface
- Good for small-medium scale
Key features
Metadata filtering:
- Search within filtered subset
- "Find similar docs from 2024"
Hybrid search:
- Combine vector + keyword search
- Best of both worlds
Multi-tenancy:
- Isolate data by user/org
- Important for SaaS
Scalability:
- Millions-billions of vectors
- Distributed architecture
When to use vector databases
Good for:
- RAG systems (document retrieval)
- Semantic search
- Recommendation engines
- Duplicate detection
- Anomaly detection
Overkill for:
- Small datasets (< 10K items)
- Exact match search
- Simple keyword search
Implementation example
import pinecone
from openai import OpenAI
# Initialize
pinecone.init(api_key="...")
index = pinecone.Index("my-index")
openai_client = OpenAI()
# Index document
text = "How to reset password..."
embedding = openai_client.embeddings.create(
input=text,
model="text-embedding-3-small"
).data[0].embedding
index.upsert([("doc1", embedding, {"text": text})])
# Search
query = "forgot my password"
query_emb = openai_client.embeddings.create(
input=query,
model="text-embedding-3-small"
).data[0].embedding
results = index.query(vector=query_emb, top_k=3)
Performance optimization
Indexing algorithms:
- HNSW: Fast, accurate (most popular)
- IVF: Good for large datasets
- Product Quantization: Compress vectors
Trade-offs:
- Accuracy vs speed
- Memory vs disk
- Indexing time vs query time
Cost considerations
Managed services:
- Pay per vector stored
- Pay per query
- $50-500+/month typical
Self-hosted:
- Server costs
- Maintenance effort
- More control
Best practices
- Choose embedding model carefully
- Experiment with index parameters
- Use metadata filtering
- Monitor query performance
- Implement caching for common queries
What's next
- Embeddings Explained
- RAG Systems
- Semantic Search
Was this guide helpful?
Your feedback helps us improve our guides
Key Terms Used in This Guide
Related Guides
Fine-Tuning Fundamentals: Customizing AI Models
IntermediateFine-tuning adapts pre-trained models to your specific use case. Learn when to fine-tune, how it works, and alternatives.
Retrieval Strategies for RAG Systems
IntermediateRAG systems retrieve relevant context before generating responses. Learn retrieval strategies, ranking, and optimization techniques.
Semantic Search: Search by Meaning, Not Keywords
IntermediateSemantic search finds results based on meaning, not exact keyword matches. Learn how it works and how to implement it.