TL;DR

Embeddings are numerical representations of text (or images, audio, and other data) that capture meaning as arrays of numbers called vectors. Words and sentences with similar meanings end up with similar numbers. This makes it possible to search by meaning, build recommendation systems, and power RAG pipelines — all with simple maths.

Why it matters

Traditional search is keyword-based. If you search for "how to fix a leaky tap" but the article says "plumbing repair for dripping faucet," keyword search misses it entirely. Embeddings solve this by representing both phrases as numbers that are close together in mathematical space, because their meanings are similar.

This is not a niche technical trick. Embeddings are the foundation of nearly every modern AI application. Semantic search, product recommendations, document clustering, duplicate detection, and RAG (Retrieval-Augmented Generation) all depend on embeddings. If you are building anything that involves finding, comparing, or organising information with AI, you will use embeddings.

What are embeddings, exactly?

An embedding converts a piece of text into an array of numbers, typically hundreds or thousands of them. These numbers represent the meaning of the text in a way that a computer can work with.

Here is a simplified example:

  • "king" might become [0.2, 0.8, -0.3, 0.5, ...]
  • "queen" might become [0.19, 0.79, -0.25, 0.48, ...] (very similar!)
  • "banana" might become [-0.5, 0.1, 0.9, -0.2, ...] (very different)

The key insight is that similar meanings produce similar numbers. "King" and "queen" are both royalty, so their embeddings are close together. "Banana" is unrelated, so its embedding is far away. This closeness is measurable using mathematical distance functions.

Each number in the array represents some aspect of meaning — though not a human-interpretable one. One dimension might loosely correspond to "royalty," another to "food," another to "emotion." But in practice, these dimensions are abstract and learned automatically during training.

How embedding models learn meaning

Embedding models are trained on billions of sentences. During training, the model sees words in context — "the king sat on the throne" and "the queen sat on the throne" — and learns that "king" and "queen" appear in similar contexts, so they should have similar embeddings.

The famous demonstration of this is the "king - man + woman = queen" analogy. If you take the embedding for "king," subtract the embedding for "man," and add the embedding for "woman," the result is very close to the embedding for "queen." The model has learned the relationship between gender and royalty purely from seeing how these words are used.

Modern embedding models go far beyond single words. They encode entire sentences and paragraphs, capturing not just word-level meaning but the overall intent and topic. "I need to fix my plumbing" and "my pipes are broken and I need a repair person" will produce similar embeddings because the models understand they are about the same thing.

Measuring similarity between embeddings

Once you have two embeddings, you need a way to measure how similar they are. The most common method is cosine similarity, which measures the angle between two vectors.

  • A cosine similarity of 1.0 means the embeddings are identical in direction (same meaning).
  • A cosine similarity of 0 means they are completely unrelated.
  • A cosine similarity of -1.0 means they are opposite in meaning.

In practice, most text embeddings range from about 0.3 (somewhat related) to 0.95+ (very similar). You set a threshold based on your use case — 0.8 might work for semantic search, while 0.95 might be needed for duplicate detection.

Other distance metrics include Euclidean distance (straight-line distance between points) and dot product (similar to cosine similarity but not normalised). Cosine similarity is the most popular for text because it focuses on direction rather than magnitude, making it robust across different text lengths.

Several providers offer embedding models, each with different trade-offs between cost, quality, and speed:

OpenAI embeddings are the most widely used. Their text-embedding-3-small model (1536 dimensions) offers a good balance of quality and cost. The text-embedding-3-large model (3072 dimensions) is more accurate but slower and more expensive.

Sentence Transformers are open-source models you can run locally. The all-MiniLM-L6-v2 model (384 dimensions) is fast and free but less accurate than commercial options. The all-mpnet-base-v2 (768 dimensions) offers better quality while still being free to run.

Cohere's embed-english-v3.0 is a strong commercial option with excellent performance on benchmarks and supports task-specific embeddings (optimised for search, classification, or clustering).

Google's models including the Universal Sentence Encoder and Vertex AI embedding models offer good quality, especially if you are already in the Google Cloud ecosystem.

Embedding dimensions: size versus quality

Embeddings come in different sizes, measured in dimensions. More dimensions generally means more accurate representations but also more storage, memory, and computation.

Small models (384 dimensions) are fast, cheap, and use little storage. Each embedding takes about 1.5 KB. Good enough for many applications, especially when speed matters more than perfect accuracy.

Medium models (768-1024 dimensions) offer a balanced trade-off. They capture more nuance while remaining practical for most applications.

Large models (1536-3072 dimensions) provide the highest quality. Each embedding takes 6-12 KB. Best for applications where accuracy is critical, like legal document search or medical research retrieval.

For most applications, start with a medium-sized model and upgrade only if your retrieval quality is not meeting requirements. The difference between 768 and 3072 dimensions is often smaller than you might expect.

Real-world use cases

Semantic search replaces keyword matching with meaning matching. Instead of requiring exact word matches, users can search naturally and find relevant results even when the exact terms differ. This is how modern documentation search, customer support knowledge bases, and product discovery work.

RAG (Retrieval-Augmented Generation) is perhaps the most important use case for embeddings today. When you ask a chatbot a question about your company's documentation, embeddings are used to find the most relevant document sections, which are then fed to the language model as context. Without embeddings, the model would have no way to know which documents are relevant.

Recommendation systems use embeddings to find "similar items." If a user liked a particular article, you can find other articles with similar embeddings and recommend those. This works for products, movies, music, and any other content.

Document clustering groups similar documents together automatically. Feed thousands of customer support tickets through an embedding model and you can automatically identify the most common topics and issues.

Duplicate detection finds near-duplicate content even when the wording is different. This is used for plagiarism detection, deduplicating databases, and identifying similar questions in FAQ systems.

Anomaly detection identifies outliers. If most of your customer support tickets cluster together but a few have very different embeddings, those outliers might indicate new issues or unusual requests worth investigating.

How to use embeddings in practice

A basic workflow looks like this:

  1. Generate embeddings for your documents using an embedding model. Store these vectors alongside the original text.
  2. Store embeddings in a vector database (like Pinecone, Weaviate, or Qdrant) that is optimised for similarity search.
  3. Query by generating an embedding for the search query and finding the stored embeddings closest to it.
  4. Return results ranked by similarity score.
from openai import OpenAI

client = OpenAI()

# Generate embedding for a query
response = client.embeddings.create(
  model="text-embedding-3-small",
  input="How do I fix a leaky faucet?"
)

query_embedding = response.data[0].embedding
# Now search your vector database for similar embeddings

The heavy lifting is done by the vector database, which uses clever algorithms (like HNSW or IVF) to search through millions of embeddings in milliseconds.

Common mistakes

Using the wrong embedding model for your task. An embedding model trained for semantic similarity might not work well for classification. Some models offer task-specific embeddings — use them when available.

Mixing embedding models. If you generate your document embeddings with one model and your query embeddings with another, the similarity scores will be meaningless. Always use the same model for both.

Not chunking long documents. Most embedding models have a maximum input length (typically 512-8192 tokens). If your document exceeds this, it will be truncated. Split long documents into meaningful chunks before embedding.

Ignoring the quality of your source text. Embeddings capture meaning, but they cannot create meaning that is not there. If your documents are poorly written, ambiguous, or outdated, the embeddings will faithfully represent that poor quality.

Over-indexing on benchmark scores. A model that scores 2% higher on a benchmark may not perform 2% better on your specific data. Test with your actual use case before committing.

What's next?

Continue building your understanding with these related guides: