Token Economics: Understanding AI Costs
By Marcin Piekarski builtweb.com.au · Last Updated: 11 February 2026
TL;DR: AI APIs charge per token. Learn how tokens work, how to estimate costs, and how to optimize spending.
TL;DR
AI APIs charge by the token, not by the word. A token is roughly four characters or three-quarters of a word. Both your input (the prompt) and the output (the response) count toward your bill. Understanding how tokens work helps you estimate costs accurately, stay within context limits, and optimize your spending without sacrificing quality.
Why it matters
If you are using AI through APIs to build products, automate workflows, or process large amounts of text, tokens directly translate to money. A single API call might cost a fraction of a cent, but thousands of calls per day add up quickly. Teams have been surprised by bills in the thousands of dollars because they did not understand how token counting works.
Beyond cost, tokens determine what your AI can even process. Every model has a context window, a maximum number of tokens it can handle in a single conversation. If your prompt plus the expected response exceeds that limit, you get an error or truncated output. Understanding tokens helps you design prompts that fit within these limits and use the available space efficiently.
For businesses building AI-powered features, token economics directly affects your profit margins. The difference between a well-optimized prompt and a wasteful one can be a 5x to 10x cost difference at scale.
What is a token?
A token is not a word. It is a sub-word unit that the model's tokenizer creates when breaking text into pieces it can process. Common words are usually a single token. Less common words get split into multiple tokens. Punctuation and spaces also count.
Here are some examples to build your intuition. The word "Hello" is 1 token. "ChatGPT" is 2 tokens: "Chat" and "GPT." A long word like "Internationalization" might be 5 tokens because it gets broken into common sub-word pieces.
The general rule of thumb is that 100 tokens equal roughly 75 English words, or that 1 token is approximately 4 characters. This varies by language. Languages that use longer words or non-Latin scripts, like German, Japanese, or Arabic, often use more tokens per word.
You can check exact token counts using tools like OpenAI's tokenizer (available online) or the tiktoken library in Python. These show you exactly how a specific piece of text gets split into tokens by a particular model.
How tokenization works
When you send text to an AI model, the first thing that happens is tokenization. A tokenizer is an algorithm that splits your text into a fixed vocabulary of sub-word pieces. The most common approach is called Byte Pair Encoding (BPE).
BPE starts with individual characters and iteratively merges the most frequently occurring pairs. After training on a large text corpus, the tokenizer has a vocabulary of typically 50,000 to 100,000 tokens. Very common words like "the" or "is" become single tokens. Rare words get split into smaller pieces that the model does recognize.
For example, the sentence "I'm learning about AI tokens." gets tokenized into something like: ["I", "'m", " learning", " about", " AI", " tokens", "."], giving you 7 tokens. Notice that spaces are often attached to the following word and that punctuation gets its own token.
Different models use different tokenizers, which means the same text produces different token counts depending on which model you use. GPT-4 and Claude use different tokenizers, so a 1,000-word document might be 1,300 tokens in one model and 1,400 in another. Always count tokens using the specific model's tokenizer for accurate cost estimates.
How token pricing works
Most AI APIs charge separately for input tokens (your prompt) and output tokens (the model's response). Output tokens are typically more expensive because they require more computation to generate. Pricing is quoted per million tokens or per thousand tokens, depending on the provider.
As of early 2026, pricing varies dramatically between models and providers. The most capable models like GPT-4o and Claude Opus cost more per token than smaller models like GPT-4o-mini or Claude Haiku. The price difference can be 10x to 50x between the cheapest and most expensive options.
Here is a concrete example. Say you have a prompt that uses 500 input tokens and the model generates a 1,000-token response. At a rate of $3 per million input tokens and $15 per million output tokens, that single call costs: (500 / 1,000,000 * $3) + (1,000 / 1,000,000 * $15) = $0.0015 + $0.015 = about $0.017 or roughly 1.7 cents. That seems tiny, but if you make 100,000 such calls per month, you are spending $1,700.
Pricing changes frequently as providers compete and release new models. Always check the current pricing page for your provider before budgeting.
Estimating and budgeting costs
To estimate costs for a project, you need three numbers: average input tokens per request, average output tokens per request, and expected request volume.
Start by running your typical prompts through a tokenizer to count input tokens. Then test with a few real requests to see how many output tokens the model generates on average. Multiply by your expected daily or monthly volume and apply the pricing formula.
Build in a buffer. Real-world usage almost always exceeds initial estimates. Retries after failures, longer-than-expected responses, and growing user adoption all push costs up. A 30 to 50 percent buffer is reasonable for initial budgeting.
For applications with variable-length inputs, like document summarization, test with your shortest and longest expected documents to understand the range. Your average cost will fall somewhere in between, but your peak cost matters for budgeting.
How to reduce token usage
The most effective optimization is writing concise prompts. Remove unnecessary context, instructions the model already follows by default, and verbose phrasing. A prompt that says "Please analyze the following text and provide a detailed summary including the main points, key themes, and any notable details" can often be shortened to "Summarize this text" with the same results and half the tokens.
Use system messages efficiently. System messages persist across an entire conversation, so every word in them costs tokens on every single request. Keep system messages focused and concise.
Set the max_tokens parameter to limit output length. If you only need a one-sentence answer, do not let the model generate a five-paragraph essay. This saves both tokens and latency.
Choose the right model for each task. Do not use your most expensive model for simple classification or extraction tasks. A smaller, cheaper model handles routine work perfectly well. Reserve your premium model for tasks that genuinely require advanced reasoning.
Implement caching for repeated or similar queries. If ten users ask the same question within an hour, serve the cached response instead of making ten API calls. Even simple caching strategies can reduce costs by 30 to 60 percent for many applications.
For batch processing, combine multiple items into a single API call when possible. Instead of making 100 separate calls to classify 100 support tickets, send them in batches of 10 or 20 with instructions to process all of them.
Hidden costs to watch for
Several costs are easy to overlook when budgeting. Retries and failures can double your effective cost if your system retries aggressively. Testing and debugging during development burns tokens that do not produce user value. Prompt engineering iterations, where you try dozens of prompt variations, add up quickly.
The biggest hidden cost in conversational applications is context accumulation. In a multi-turn conversation, the entire conversation history is sent with every new message. By turn 20, your input tokens might be 10x what they were at turn 1. Implement conversation summarization or sliding window strategies to keep context manageable.
Image and audio inputs, for multimodal models, use significantly more tokens than text. A single high-resolution image can cost the equivalent of thousands of text tokens. Factor this into your pricing if your application handles visual content.
Monitoring and controlling costs
Set up monitoring from day one. Track API usage per feature, per user, and per day. Most providers offer usage dashboards, but build your own monitoring too so you can correlate costs with specific application behaviors.
Set hard spending limits in your API provider's dashboard. These prevent runaway costs from bugs, abuse, or unexpected traffic spikes. An infinite loop that calls the API can burn through hundreds of dollars in minutes.
Alert on anomalies. If your daily spend suddenly doubles, you want to know immediately, not at the end of the month. Set up alerts at 80 percent and 100 percent of your expected daily budget.
Review your costs weekly and look for optimization opportunities. Often, a small change to a frequently-used prompt or switching one feature to a cheaper model can save hundreds of dollars per month.
Common mistakes
The most common mistake is not counting tokens before building. People design prompts, build features, and then discover their costs are 5x what they expected. Always prototype and measure token usage before committing to an approach.
Another mistake is sending the entire document when only part of it is relevant. If a user asks about chapter 3 of a book, do not send the entire book as context. Extract the relevant section first using search or chunking.
Teams frequently ignore output token costs, which are often 2x to 5x higher than input token costs. Letting the model ramble with no output limit is expensive. Be specific about the format and length you want.
Finally, many people use a single model for everything. Using GPT-4-class models for tasks that GPT-4o-mini handles perfectly is like taking a helicopter to the corner store. Match the model to the task complexity.
What's next?
- Understand the limits tokens impose in Context Windows
- Learn to write efficient prompts in Prompt Engineering Basics
- Compare model options in Choosing AI Tools
- See how costs factor into business decisions in AI Cost Management
Frequently Asked Questions
How many tokens is a typical ChatGPT conversation?
A casual back-and-forth conversation of about 10 messages might use 2,000 to 5,000 tokens total. A detailed conversation with long prompts and responses could easily reach 10,000 to 30,000 tokens. Remember that in a conversation, all previous messages are resent with each new message, so token usage grows faster than you might expect.
Why are output tokens more expensive than input tokens?
Input tokens are processed in parallel, which is computationally efficient. Output tokens must be generated one at a time, with each new token requiring a full forward pass through the model. This sequential generation is much more computationally expensive, which is why providers charge more for output.
Do spaces and punctuation count as tokens?
Yes. Spaces are typically merged with the following word into a single token, and punctuation marks like periods, commas, and question marks each count as their own token. Even empty lines and formatting characters consume tokens.
How can I count tokens before sending a request?
Most providers offer tokenizer tools. OpenAI has an online tokenizer and the tiktoken Python library. Anthropic provides token counting in their API. For quick estimates, divide your character count by 4 or your word count by 0.75. For accurate counts, always use the specific model's tokenizer.
Was this guide helpful?
Your feedback helps us improve our guides
About the Authors
Marcin Piekarski· Frontend Lead & AI Educator
Marcin is a Frontend Lead with 20+ years in tech. Currently building headless ecommerce at Harvey Norman (Next.js, Node.js, GraphQL). He created Field Guide to AI to help others understand AI tools practically—without the jargon.
Credentials & Experience:
- 20+ years web development experience
- Frontend Lead at Harvey Norman (10 years)
- Worked with: Gumtree, CommBank, Woolworths, Optus, M&C Saatchi
- Runs AI workshops for teams
- Founder of builtweb.com.au
- Daily AI tools user: ChatGPT, Claude, Gemini, AI coding assistants
- Specializes in React ecosystem: React, Next.js, Node.js
Areas of Expertise:
Prism AI· AI Research & Writing Assistant
Prism AI is the AI ghostwriter behind Field Guide to AI—a collaborative ensemble of frontier models (Claude, ChatGPT, Gemini, and others) that assist with research, drafting, and content synthesis. Like light through a prism, human expertise is refracted through multiple AI perspectives to create clear, comprehensive guides. All AI-generated content is reviewed, fact-checked, and refined by Marcin before publication.
Transparency Note: All AI-assisted content is thoroughly reviewed, fact-checked, and refined by Marcin Piekarski before publication.
Key Terms Used in This Guide
Token
A chunk of text — usually a word or part of a word — that AI models process as a single unit. Most English words are one token, but longer or uncommon words get split into pieces.
AI (Artificial Intelligence)
Making machines perform tasks that typically require human intelligence—like understanding language, recognizing patterns, or making decisions.
Related Guides
A/B Testing AI Outputs: Measure What Works
IntermediateHow do you know if your AI changes improved outcomes? Learn to A/B test prompts, models, and parameters scientifically.
6 min readBatch Processing with AI: Efficiency at Scale
IntermediateProcess thousands of items efficiently with batch AI operations. Learn strategies for large-scale AI tasks.
8 min readAI API Integration Basics
IntermediateLearn how to integrate AI APIs into your applications. Authentication, requests, error handling, and best practices.
8 min read