TL;DR

Vector databases store embeddings and enable fast similarity search. Essential for RAG, recommendations, and semantic search at scale.

What is a vector database?

Definition:
A database optimized for storing and searching high-dimensional vectors (embeddings).

Why not regular databases?

  • Regular DBs: Exact match, keyword search
  • Vector DBs: Similarity search, semantic matching
  • Vector DBs: Optimized for high-dimensional data

How they work

Index creation:

  1. Generate embeddings for documents
  2. Store vectors with metadata
  3. Build efficient index (HNSW, IVF, etc.)

Search:

  1. Convert query to embedding
  2. Find k nearest neighbors
  3. Return similar items

Nearest neighbor search:

  • Find items closest to query vector
  • Measure: cosine similarity, dot product, L2 distance

Example:

  • Query: "How to fix a leak?"
  • Returns: Documents about plumbing repairs
  • Even if exact words don't match

Pinecone:

  • Managed service
  • Easy to use
  • Auto-scaling
  • Paid

Weaviate:

  • Open source or managed
  • Multi-modal support
  • Hybrid search
  • Self-host or cloud

Qdrant:

  • Open source
  • Rust-based (fast)
  • Good filtering
  • Self-host or cloud

Chroma:

  • Lightweight
  • Great for prototyping
  • Open source
  • Embedded mode

Milvus:

  • Open source
  • Highly scalable
  • Enterprise features

Pgvector (Postgres extension):

  • Add vectors to existing Postgres
  • Familiar SQL interface
  • Good for small-medium scale

Key features

Metadata filtering:

  • Search within filtered subset
  • "Find similar docs from 2024"

Hybrid search:

  • Combine vector + keyword search
  • Best of both worlds

Multi-tenancy:

  • Isolate data by user/org
  • Important for SaaS

Scalability:

  • Millions-billions of vectors
  • Distributed architecture

When to use vector databases

Good for:

  • RAG systems (document retrieval)
  • Semantic search
  • Recommendation engines
  • Duplicate detection
  • Anomaly detection

Overkill for:

  • Small datasets (< 10K items)
  • Exact match search
  • Simple keyword search

Implementation example

import pinecone
from openai import OpenAI

# Initialize
pinecone.init(api_key="...")
index = pinecone.Index("my-index")
openai_client = OpenAI()

# Index document
text = "How to reset password..."
embedding = openai_client.embeddings.create(
    input=text,
    model="text-embedding-3-small"
).data[0].embedding

index.upsert([("doc1", embedding, {"text": text})])

# Search
query = "forgot my password"
query_emb = openai_client.embeddings.create(
    input=query,
    model="text-embedding-3-small"
).data[0].embedding

results = index.query(vector=query_emb, top_k=3)

Performance optimization

Indexing algorithms:

  • HNSW: Fast, accurate (most popular)
  • IVF: Good for large datasets
  • Product Quantization: Compress vectors

Trade-offs:

  • Accuracy vs speed
  • Memory vs disk
  • Indexing time vs query time

Cost considerations

Managed services:

  • Pay per vector stored
  • Pay per query
  • $50-500+/month typical

Self-hosted:

  • Server costs
  • Maintenance effort
  • More control

Best practices

  1. Choose embedding model carefully
  2. Experiment with index parameters
  3. Use metadata filtering
  4. Monitor query performance
  5. Implement caching for common queries

What&#39;s next

  • Embeddings Explained
  • RAG Systems
  • Semantic Search