Skip to main content
BETAThis is a new design — give feedback
Module 425 minutes

Vector Databases and Embeddings

Work with vector databases for semantic search. Choose and implement the right solution.

vector-databasesembeddingssemantic-search
Share:

Learning Objectives

  • Understand vector embeddings
  • Choose vector database
  • Implement semantic search
  • Optimize performance

Why Regular Databases Can't Search by Meaning

Traditional databases are brilliant at exact matches. Ask for "all orders from customer #1234" and you'll get a precise answer in milliseconds. But ask "find documents about our return policy" and a traditional database is lost — it can only look for the exact words "return policy," not the concept behind them. A document titled "Refund and Exchange Guidelines" wouldn't match, even though it's exactly what you need.

This is the problem vector databases solve. They search by meaning, not by keywords. They understand that "return policy," "refund guidelines," and "how to send something back" all refer to the same concept.

How Vectors Represent Meaning

To understand vector databases, you need to understand vectors — but don't worry, the concept is simpler than it sounds.

Think of a map with coordinates. London might be at position (51.5, -0.1) and Paris at (48.9, 2.3). Cities that are close together on the map have similar coordinates. Vectors work the same way, but instead of a 2D map, they use a space with hundreds or thousands of dimensions.

When you create an "embedding" of a piece of text, you're converting that text into coordinates in this high-dimensional space. Texts with similar meanings end up near each other. "How do I return a product?" and "What's your refund process?" would have very similar coordinates, even though they share almost no words.

This is what makes semantic search possible. Instead of matching keywords, you find the nearest neighbours in this meaning-space.

from openai import OpenAI
client = OpenAI()

# Turn text into a vector (list of numbers)
embedding = client.embeddings.create(
    model="text-embedding-3-small",
    input="How do I return a product?"
).data[0].embedding

# This returns a list of 1536 numbers that represent the meaning
print(len(embedding))  # 1536

There are several solid options, each with different strengths. Here's how to think about them.

Pinecone — Managed and Simple

Pinecone is a fully managed cloud service. You don't run any infrastructure — just send vectors in and query them back. It scales automatically and handles all the complexity behind the scenes. The tradeoff is cost: it can get expensive at large scale, and your data lives on their servers. Best for: teams that want the fastest path from zero to working product without managing infrastructure.

Chroma — Lightweight and Local

Chroma runs locally and can even be embedded directly in your application. It's open source, free, and perfect for prototyping or smaller projects. You can run it in memory during development and switch to a persistent store for production. Best for: prototyping, small-to-medium datasets, and situations where you want everything running on your own machine.

pgvector — Postgres Extension

If you're already using PostgreSQL (and many teams are), pgvector adds vector search capabilities to your existing database. No new infrastructure required. The performance isn't as specialised as dedicated vector databases, but for many use cases it's more than good enough. Best for: teams already running Postgres who don't want to add another database to their stack.

Qdrant — Fast and Open Source

Qdrant is an open-source vector database built for performance. It offers advanced filtering capabilities and can be self-hosted or used as a cloud service. Best for: teams that need high performance, advanced filtering, and want the option to self-host.

Weaviate — Feature-Rich

Weaviate includes built-in vectorisation (it can create embeddings for you), a GraphQL API, and hybrid search that combines keyword and semantic search. Best for: teams that want an all-in-one solution with built-in embedding generation.

Choosing the Right One for Your Use Case

Here's a practical decision framework:

Just prototyping or learning? Start with Chroma. It's free, runs locally, and takes minutes to set up.

Already using Postgres? Try pgvector first. Fewer moving parts means fewer things that can break.

Building for production scale with minimal ops work? Pinecone handles the infrastructure so you can focus on your product.

Need maximum control and performance? Qdrant or Weaviate give you the flexibility to self-host and tune for your specific workload.

The honest answer: for most teams starting out, any of these will work. Pick the one that fits your existing tech stack and don't over-think it. You can always migrate later.

Indexing Strategies: The Basics

How you index your vectors affects both search quality and speed.

Flat indexing compares your query against every single vector. It's perfectly accurate but slow at large scale. Fine for up to about 100,000 vectors.

Approximate nearest neighbour (ANN) indexing uses clever shortcuts to find "close enough" results much faster. You sacrifice a tiny bit of accuracy for a massive speed improvement. Most vector databases use ANN by default once your data grows beyond a certain size.

Metadata filtering lets you narrow your search before comparing vectors. For example, "find similar documents, but only from the 2024 product catalogue." This reduces the search space and improves both speed and relevance.

Practical tip: Start with the default indexing settings your vector database provides. They're tuned for common use cases. Only optimise when you have a specific performance problem — premature optimisation here is a real time sink.

Putting It Together

Here's a minimal working example that creates embeddings, stores them, and queries them:

import chromadb
from openai import OpenAI

ai = OpenAI()
db = chromadb.Client()
collection = db.create_collection("my_docs")

# Add documents
docs = ["Our return policy allows 30-day returns",
        "Free shipping on orders over $50",
        "Contact support at help@example.com"]

for i, doc in enumerate(docs):
    embedding = ai.embeddings.create(
        model="text-embedding-3-small", input=doc
    ).data[0].embedding
    collection.add(ids=[f"doc_{i}"], embeddings=[embedding],
                    documents=[doc])

# Search by meaning
query = "How can I send something back?"
query_vec = ai.embeddings.create(
    model="text-embedding-3-small", input=query
).data[0].embedding

results = collection.query(query_embeddings=[query_vec], n_results=1)
print(results["documents"])
# Returns: "Our return policy allows 30-day returns"

Notice how the query "How can I send something back?" found the return policy document, even though they share no keywords. That's the power of semantic search.

Key Takeaways

  • Embeddings convert text to vectors for semantic similarity
  • Choose vector DB based on scale and hosting needs
  • Use metadata for filtering results
  • Batch operations for efficiency
  • Monitor index size and costs

Practice Exercises

Apply what you've learned with these practical exercises:

  • 1.Create embeddings for sample documents
  • 2.Set up vector database
  • 3.Implement semantic search
  • 4.Compare different embedding models

Related Guides