Embeddings: Turning Words into Math
By Marcin Piekarski builtweb.com.au · Last Updated: 11 February 2026
TL;DR: Embeddings convert text into numbers that capture meaning. Essential for search, recommendations, and RAG systems.
TL;DR
Embeddings are numerical representations of text (or images, audio, and other data) that capture meaning as arrays of numbers called vectors. Words and sentences with similar meanings end up with similar numbers. This makes it possible to search by meaning, build recommendation systems, and power RAG pipelines — all with simple maths.
Why it matters
Traditional search is keyword-based. If you search for "how to fix a leaky tap" but the article says "plumbing repair for dripping faucet," keyword search misses it entirely. Embeddings solve this by representing both phrases as numbers that are close together in mathematical space, because their meanings are similar.
This is not a niche technical trick. Embeddings are the foundation of nearly every modern AI application. Semantic search, product recommendations, document clustering, duplicate detection, and RAG (Retrieval-Augmented Generation) all depend on embeddings. If you are building anything that involves finding, comparing, or organising information with AI, you will use embeddings.
What are embeddings, exactly?
An embedding converts a piece of text into an array of numbers, typically hundreds or thousands of them. These numbers represent the meaning of the text in a way that a computer can work with.
Here is a simplified example:
- "king" might become [0.2, 0.8, -0.3, 0.5, ...]
- "queen" might become [0.19, 0.79, -0.25, 0.48, ...] (very similar!)
- "banana" might become [-0.5, 0.1, 0.9, -0.2, ...] (very different)
The key insight is that similar meanings produce similar numbers. "King" and "queen" are both royalty, so their embeddings are close together. "Banana" is unrelated, so its embedding is far away. This closeness is measurable using mathematical distance functions.
Each number in the array represents some aspect of meaning — though not a human-interpretable one. One dimension might loosely correspond to "royalty," another to "food," another to "emotion." But in practice, these dimensions are abstract and learned automatically during training.
How embedding models learn meaning
Embedding models are trained on billions of sentences. During training, the model sees words in context — "the king sat on the throne" and "the queen sat on the throne" — and learns that "king" and "queen" appear in similar contexts, so they should have similar embeddings.
The famous demonstration of this is the "king - man + woman = queen" analogy. If you take the embedding for "king," subtract the embedding for "man," and add the embedding for "woman," the result is very close to the embedding for "queen." The model has learned the relationship between gender and royalty purely from seeing how these words are used.
Modern embedding models go far beyond single words. They encode entire sentences and paragraphs, capturing not just word-level meaning but the overall intent and topic. "I need to fix my plumbing" and "my pipes are broken and I need a repair person" will produce similar embeddings because the models understand they are about the same thing.
Measuring similarity between embeddings
Once you have two embeddings, you need a way to measure how similar they are. The most common method is cosine similarity, which measures the angle between two vectors.
- A cosine similarity of 1.0 means the embeddings are identical in direction (same meaning).
- A cosine similarity of 0 means they are completely unrelated.
- A cosine similarity of -1.0 means they are opposite in meaning.
In practice, most text embeddings range from about 0.3 (somewhat related) to 0.95+ (very similar). You set a threshold based on your use case — 0.8 might work for semantic search, while 0.95 might be needed for duplicate detection.
Other distance metrics include Euclidean distance (straight-line distance between points) and dot product (similar to cosine similarity but not normalised). Cosine similarity is the most popular for text because it focuses on direction rather than magnitude, making it robust across different text lengths.
Popular embedding models
Several providers offer embedding models, each with different trade-offs between cost, quality, and speed:
OpenAI embeddings are the most widely used. Their text-embedding-3-small model (1536 dimensions) offers a good balance of quality and cost. The text-embedding-3-large model (3072 dimensions) is more accurate but slower and more expensive.
Sentence Transformers are open-source models you can run locally. The all-MiniLM-L6-v2 model (384 dimensions) is fast and free but less accurate than commercial options. The all-mpnet-base-v2 (768 dimensions) offers better quality while still being free to run.
Cohere's embed-english-v3.0 is a strong commercial option with excellent performance on benchmarks and supports task-specific embeddings (optimised for search, classification, or clustering).
Google's models including the Universal Sentence Encoder and Vertex AI embedding models offer good quality, especially if you are already in the Google Cloud ecosystem.
Embedding dimensions: size versus quality
Embeddings come in different sizes, measured in dimensions. More dimensions generally means more accurate representations but also more storage, memory, and computation.
Small models (384 dimensions) are fast, cheap, and use little storage. Each embedding takes about 1.5 KB. Good enough for many applications, especially when speed matters more than perfect accuracy.
Medium models (768-1024 dimensions) offer a balanced trade-off. They capture more nuance while remaining practical for most applications.
Large models (1536-3072 dimensions) provide the highest quality. Each embedding takes 6-12 KB. Best for applications where accuracy is critical, like legal document search or medical research retrieval.
For most applications, start with a medium-sized model and upgrade only if your retrieval quality is not meeting requirements. The difference between 768 and 3072 dimensions is often smaller than you might expect.
Real-world use cases
Semantic search replaces keyword matching with meaning matching. Instead of requiring exact word matches, users can search naturally and find relevant results even when the exact terms differ. This is how modern documentation search, customer support knowledge bases, and product discovery work.
RAG (Retrieval-Augmented Generation) is perhaps the most important use case for embeddings today. When you ask a chatbot a question about your company's documentation, embeddings are used to find the most relevant document sections, which are then fed to the language model as context. Without embeddings, the model would have no way to know which documents are relevant.
Recommendation systems use embeddings to find "similar items." If a user liked a particular article, you can find other articles with similar embeddings and recommend those. This works for products, movies, music, and any other content.
Document clustering groups similar documents together automatically. Feed thousands of customer support tickets through an embedding model and you can automatically identify the most common topics and issues.
Duplicate detection finds near-duplicate content even when the wording is different. This is used for plagiarism detection, deduplicating databases, and identifying similar questions in FAQ systems.
Anomaly detection identifies outliers. If most of your customer support tickets cluster together but a few have very different embeddings, those outliers might indicate new issues or unusual requests worth investigating.
How to use embeddings in practice
A basic workflow looks like this:
- Generate embeddings for your documents using an embedding model. Store these vectors alongside the original text.
- Store embeddings in a vector database (like Pinecone, Weaviate, or Qdrant) that is optimised for similarity search.
- Query by generating an embedding for the search query and finding the stored embeddings closest to it.
- Return results ranked by similarity score.
from openai import OpenAI
client = OpenAI()
# Generate embedding for a query
response = client.embeddings.create(
model="text-embedding-3-small",
input="How do I fix a leaky faucet?"
)
query_embedding = response.data[0].embedding
# Now search your vector database for similar embeddings
The heavy lifting is done by the vector database, which uses clever algorithms (like HNSW or IVF) to search through millions of embeddings in milliseconds.
Common mistakes
Using the wrong embedding model for your task. An embedding model trained for semantic similarity might not work well for classification. Some models offer task-specific embeddings — use them when available.
Mixing embedding models. If you generate your document embeddings with one model and your query embeddings with another, the similarity scores will be meaningless. Always use the same model for both.
Not chunking long documents. Most embedding models have a maximum input length (typically 512-8192 tokens). If your document exceeds this, it will be truncated. Split long documents into meaningful chunks before embedding.
Ignoring the quality of your source text. Embeddings capture meaning, but they cannot create meaning that is not there. If your documents are poorly written, ambiguous, or outdated, the embeddings will faithfully represent that poor quality.
Over-indexing on benchmark scores. A model that scores 2% higher on a benchmark may not perform 2% better on your specific data. Test with your actual use case before committing.
What's next?
Continue building your understanding with these related guides:
- Vector Database Fundamentals for storing and searching embeddings at scale
- RAG: Retrieval-Augmented Generation for the most popular embedding use case
- Semantic Search Fundamentals for building search powered by meaning
- Token Economics for understanding the cost implications of embedding operations
Frequently Asked Questions
Can I generate embeddings for free?
Yes. Open-source models like Sentence Transformers can run on your own hardware at no cost beyond electricity. The all-MiniLM-L6-v2 model works well for many applications and runs on a standard laptop CPU. Commercial APIs like OpenAI's embedding models are inexpensive (fractions of a cent per embedding) but not free.
How much storage do embeddings require?
Each embedding is an array of floating-point numbers. A 1536-dimension embedding (like OpenAI's text-embedding-3-small) takes about 6 KB in float32 format. For 1 million documents, that is about 6 GB of vector storage. Using quantisation (reducing precision from float32 to int8) can reduce this by 4x with minimal quality loss.
Do embeddings work for languages other than English?
Yes. Many modern embedding models are multilingual. OpenAI's embedding models support over 100 languages. Multilingual Sentence Transformers models can even match text across languages — a query in English can find a relevant document in French. Quality varies by language, with better results for widely spoken languages.
What is the difference between word embeddings and sentence embeddings?
Word embeddings (like Word2Vec) represent individual words as vectors. Sentence embeddings represent entire sentences or paragraphs as single vectors. Sentence embeddings are far more useful for most applications because they capture the overall meaning of a passage, not just individual words. Modern embedding models almost always produce sentence-level embeddings.
Was this guide helpful?
Your feedback helps us improve our guides
About the Authors
Marcin Piekarski· Frontend Lead & AI Educator
Marcin is a Frontend Lead with 20+ years in tech. Currently building headless ecommerce at Harvey Norman (Next.js, Node.js, GraphQL). He created Field Guide to AI to help others understand AI tools practically—without the jargon.
Credentials & Experience:
- 20+ years web development experience
- Frontend Lead at Harvey Norman (10 years)
- Worked with: Gumtree, CommBank, Woolworths, Optus, M&C Saatchi
- Runs AI workshops for teams
- Founder of builtweb.com.au
- Daily AI tools user: ChatGPT, Claude, Gemini, AI coding assistants
- Specializes in React ecosystem: React, Next.js, Node.js
Areas of Expertise:
Prism AI· AI Research & Writing Assistant
Prism AI is the AI ghostwriter behind Field Guide to AI—a collaborative ensemble of frontier models (Claude, ChatGPT, Gemini, and others) that assist with research, drafting, and content synthesis. Like light through a prism, human expertise is refracted through multiple AI perspectives to create clear, comprehensive guides. All AI-generated content is reviewed, fact-checked, and refined by Marcin before publication.
Transparency Note: All AI-assisted content is thoroughly reviewed, fact-checked, and refined by Marcin Piekarski before publication.
Key Terms Used in This Guide
Embedding
A list of numbers that represents the meaning of text, images, or other data. Similar meanings produce similar numbers, so computers can measure how 'close' two concepts are.
RAG (Retrieval-Augmented Generation)
A technique where AI searches your documents for relevant information first, then uses what it finds to generate accurate, grounded answers.
Related Guides
Natural Language Processing: How AI Understands Text
IntermediateNLP is how AI reads, understands, and generates human language. Learn the techniques behind chatbots, translation, and text analysis.
8 min readAI Model Architectures: A High-Level Overview
IntermediateFrom transformers to CNNs to diffusion models—understand the different AI architectures and what they're good at.
7 min readContext Windows: How Much AI Can Remember
IntermediateContext windows determine how much text an AI can process at once. Learn how they work, their limits, and how to work within them.
8 min read