TL;DR

Tokens are chunks of text (roughly 4 chars or ¾ of a word). AI APIs charge per token for both input and output. Understanding tokens helps you estimate costs and optimize usage.

What is a token?

Not a word—a sub-word unit:

  • "Hello" = 1 token
  • "ChatGPT" = 2 tokens (Chat + GPT)
  • "Internationalization" = 5 tokens

Rule of thumb:

  • 100 tokens ā‰ˆ 75 words
  • 1 token ā‰ˆ 4 characters

How tokenization works

  1. Text is split using a tokenizer
  2. Common words = 1 token
  3. Rare words split into parts
  4. Punctuation and spaces count

Example:
"I'm learning about AI tokens."
→ ["I", "'m", " learning", " about", " AI", " tokens", "."]
= 7 tokens

Why tokens matter

Pricing:

  • Most APIs charge per 1000 tokens
  • Both input (prompt) and output count
  • Longer conversations = higher cost

Context limits:

  • Models have token limits (4k, 8k, 128k)
  • Includes prompt + response
  • Going over = error or truncation

Typical pricing (as of 2024)

GPT-4:

  • Input: $0.03 per 1K tokens
  • Output: $0.06 per 1K tokens
  • $0.30 per conversation (10K tokens)

GPT-3.5:

  • Input: $0.0005 per 1K tokens
  • Output: $0.0015 per 1K tokens
  • 20x cheaper than GPT-4

Claude (Anthropic):

  • Similar to GPT-4
  • Sonnet: Mid-tier pricing
  • Haiku: Cheapest option

Estimating costs

Simple calculation:

  1. Count tokens in your prompt (use tokenizer tool)
  2. Estimate output length
  3. Multiply by price per 1K tokens

Example:

  • Prompt: 500 tokens
  • Response: 1000 tokens
  • Total: 1500 tokens
  • Cost (GPT-4): $0.075

How to reduce token usage

Shorter prompts:

  • Be concise
  • Remove unnecessary context
  • Use system messages efficiently

Limit output:

  • Set max_tokens parameter
  • Request shorter responses

Batch processing:

  • Process multiple items in one call
  • Amortize prompt overhead

Choose the right model:

  • GPT-3.5 for simple tasks
  • GPT-4 only when needed

Cache conversations:

  • Reuse responses when possible
  • Don't re-generate identical content

Hidden costs

  • Retries and failures
  • Testing and debugging
  • Unusedprompt engineering iterations
  • Context accumulation in conversations

Monitoring costs

  • Track API usage daily
  • Set spending limits
  • Monitor per-user or per-feature costs
  • Alert on anomalies

What's next

  • Context Windows
  • Prompt Engineering for Cost
  • Model Selection Guide