TL;DR

A context window is the maximum amount of text (in tokens) an AI can process at once, including both input and output. Larger windows enable longer conversations and document analysis.

What is a context window?

Definition:
The total number of tokens a model can consider at one time.

Includes:

When exceeded:

  • Older messages dropped (truncated)
  • Error returned
  • Quality degrades

Common context window sizes

Small (4K-8K tokens):

  • GPT-3.5 (4K version): ~3,000 words
  • Good for: Quick questions, short chats

Medium (32K-64K tokens):

  • GPT-4 (32K): ~24,000 words
  • Good for: Long conversations, medium documents

Large (128K-200K tokens):

  • Claude 3 (200K): ~150,000 words (a novel!)
  • GPT-4 Turbo (128K)
  • Good for: Entire codebases, books, long analysis

Why context size matters

Use cases:

  • Small: Customer support, simple Q&A
  • Medium: Summarizing articles, code review
  • Large: Analyzing contracts, entire codebase search

Cost:

  • Larger context = more expensive
  • Process only what's needed

How models "forget"

With conversation history:

  1. Each message adds tokens
  2. When limit reached, oldest messages dropped
  3. Model "forgets" early conversation

Strategies to extend memory:

  • Summarize old conversation periodically
  • Use external storage (databases, vector stores)
  • RAG: Retrieve only relevant context

Working within limits

Optimize context usage:

  • Include only relevant information
  • Summarize long documents before analysis
  • Use RAG for large knowledge bases

Chunking long documents:

  • Split into smaller pieces
  • Process separately
  • Combine results

System vs user messages:

  • System messages set behavior (use wisely)
  • User messages: actual requests
  • Both count toward limit

Context window vs. memory

Context window:

  • Short-term, per-conversation
  • Disappears when chat ends

Long-term memory:

  • Stored externally
  • Retrieved as needed
  • Built with databases, vector stores

Performance considerations

  • Larger context = slower processing
  • Costs scale with context size
  • Quality may degrade near limits
  • Context windows growing rapidly
  • 1M+ token windows in development
  • Eventually: entire databases as context

What's next

  • Token Economics
  • RAG for Long Documents
  • Prompt Engineering