- Home
- /Guides
- /core concepts
- /Context Windows: How Much AI Can Remember
Context Windows: How Much AI Can Remember
Context windows determine how much text an AI can process at once. Learn how they work, their limits, and how to work within them.
TL;DR
A context window is the maximum amount of text (in tokens) an AI can process at once, including both input and output. Larger windows enable longer conversations and document analysis.
What is a context window?
Definition:
The total number of tokens a model can consider at one time.
Includes:
- Your prompt
- Conversation history
- System instructions
- Model's response
When exceeded:
- Older messages dropped (truncated)
- Error returned
- Quality degrades
Common context window sizes
Small (4K-8K tokens):
- GPT-3.5 (4K version): ~3,000 words
- Good for: Quick questions, short chats
Medium (32K-64K tokens):
- GPT-4 (32K): ~24,000 words
- Good for: Long conversations, medium documents
Large (128K-200K tokens):
- Claude 3 (200K): ~150,000 words (a novel!)
- GPT-4 Turbo (128K)
- Good for: Entire codebases, books, long analysis
Why context size matters
Use cases:
- Small: Customer support, simple Q&A
- Medium: Summarizing articles, code review
- Large: Analyzing contracts, entire codebase search
Cost:
- Larger context = more expensive
- Process only what's needed
How models "forget"
With conversation history:
- Each message adds tokens
- When limit reached, oldest messages dropped
- Model "forgets" early conversation
Strategies to extend memory:
- Summarize old conversation periodically
- Use external storage (databases, vector stores)
- RAG: Retrieve only relevant context
Working within limits
Optimize context usage:
- Include only relevant information
- Summarize long documents before analysis
- Use RAG for large knowledge bases
Chunking long documents:
- Split into smaller pieces
- Process separately
- Combine results
System vs user messages:
- System messages set behavior (use wisely)
- User messages: actual requests
- Both count toward limit
Context window vs. memory
Context window:
- Short-term, per-conversation
- Disappears when chat ends
Long-term memory:
- Stored externally
- Retrieved as needed
- Built with databases, vector stores
Performance considerations
- Larger context = slower processing
- Costs scale with context size
- Quality may degrade near limits
Future trends
- Context windows growing rapidly
- 1M+ token windows in development
- Eventually: entire databases as context
What's next
Was this guide helpful?
Your feedback helps us improve our guides
Key Terms Used in This Guide
Context Window
How much text an AI can 'see' or 'remember' at once. Older messages fall off when the window fills up.
AI (Artificial Intelligence)
Making machines perform tasks that typically require human intelligence—like understanding language, recognizing patterns, or making decisions.
Token
A chunk of text (usually a word or part of a word) that AI processes. 'Chatbot' might be one token or split into 'chat' and 'bot'.
Related Guides
AI Model Architectures: A High-Level Overview
IntermediateFrom transformers to CNNs to diffusion models—understand the different AI architectures and what they're good at.
Embeddings: Turning Words into Math
IntermediateEmbeddings convert text into numbers that capture meaning. Essential for search, recommendations, and RAG systems.
Multi-Modal AI: Beyond Text
IntermediateMulti-modal AI processes multiple types of data—text, images, audio, video. Learn how these systems work and their applications.