Intermediate6 min read

Context Windows: How Much AI Can Remember

Context windows determine how much text an AI can process at once. Learn how they work, their limits, and how to work within them.

contextmemorytokenslimitations

TL;DR

A context window is the maximum amount of text (in tokens) an AI can process at once, including both input and output. Larger windows enable longer conversations and document analysis.

What is a context window?

Definition:
The total number of tokens a model can consider at one time.

Includes:

Your prompt
Conversation history
System instructions
Model's response

When exceeded:

Older messages dropped (truncated)
Error returned
Quality degrades

Common context window sizes

Small (4K-8K tokens):

GPT-3.5 (4K version): ~3,000 words
Good for: Quick questions, short chats

Medium (32K-64K tokens):

GPT-4 (32K): ~24,000 words
Good for: Long conversations, medium documents

Large (128K-200K tokens):

Claude 3 (200K): ~150,000 words (a novel!)
GPT-4 Turbo (128K)
Good for: Entire codebases, books, long analysis

Why context size matters

Use cases:

Small: Customer support, simple Q&A
Medium: Summarizing articles, code review
Large: Analyzing contracts, entire codebase search

Cost:

Larger context = more expensive
Process only what's needed

How models "forget"

With conversation history:

Each message adds tokens
When limit reached, oldest messages dropped
Model "forgets" early conversation

Strategies to extend memory:

Summarize old conversation periodically
Use external storage (databases, vector stores)
RAG: Retrieve only relevant context

Working within limits

Optimize context usage:

Include only relevant information
Summarize long documents before analysis
Use RAG for large knowledge bases

Chunking long documents:

Split into smaller pieces
Process separately
Combine results

System vs user messages:

System messages set behavior (use wisely)
User messages: actual requests
Both count toward limit

Context window vs. memory

Context window:

Short-term, per-conversation
Disappears when chat ends

Long-term memory:

Stored externally
Retrieved as needed
Built with databases, vector stores

Performance considerations

Larger context = slower processing
Costs scale with context size
Quality may degrade near limits

Future trends

Context windows growing rapidly
1M+ token windows in development
Eventually: entire databases as context

What's next

Token Economics
RAG for Long Documents
Prompt Engineering

Was this guide helpful?

Your feedback helps us improve our guides

Key Terms Used in This Guide

Context Window

How much text an AI can 'see' or 'remember' at once. Older messages fall off when the window fills up.

AI (Artificial Intelligence)

Making machines perform tasks that typically require human intelligence—like understanding language, recognizing patterns, or making decisions.

Token

A chunk of text (usually a word or part of a word) that AI processes. 'Chatbot' might be one token or split into 'chat' and 'bot'.

Related Guides

AI Model Architectures: A High-Level Overview

Intermediate

From transformers to CNNs to diffusion models—understand the different AI architectures and what they're good at.

7 min read

Embeddings: Turning Words into Math

Intermediate

Embeddings convert text into numbers that capture meaning. Essential for search, recommendations, and RAG systems.

7 min read

Multi-Modal AI: Beyond Text

Intermediate

Multi-modal AI processes multiple types of data—text, images, audio, video. Learn how these systems work and their applications.

6 min read

TL;DR

What is a context window?

Common context window sizes

Why context size matters

How models &quot;forget&quot;

Working within limits

Context window vs. memory

Performance considerations

Future trends

What&#39;s next

Was this guide helpful?

Key Terms Used in This Guide

Context Window

AI (Artificial Intelligence)

Token

Related Guides

AI Model Architectures: A High-Level Overview

Embeddings: Turning Words into Math

Multi-Modal AI: Beyond Text

How models "forget"

What's next