Inference
Also known as: Model Inference, Prediction
In one sentence
The process where a trained AI model takes new input and produces an output—a prediction, answer, or generated text. This is the 'using' phase that happens after training is complete.
Explain like I'm 12
Training is like studying for an exam. Inference is taking the exam—you use everything you learned to answer new questions you haven't seen before.
In context
Every time you type a message into ChatGPT and get a response, that's inference happening. The model applies patterns it learned during training to generate an answer to your specific prompt. Cloud providers like AWS, Google Cloud, and Azure charge for inference by the token or by compute time. Companies running AI at scale often spend far more on inference than on training, since training happens once but inference runs millions of times per day.
See also
Related Guides
Learn more about Inference in these guides:
Efficient Inference Optimization
AdvancedOptimize AI inference for speed and cost: batching, caching, model serving, KV cache, speculative decoding, and more.
8 min readMachine Learning Fundamentals: How Machines Learn from Data
BeginnerUnderstand the basics of machine learning. From training to inference—a practical introduction to how ML systems work without deep math or coding.
11 min readSupervised vs Unsupervised Learning: When to Use Which
BeginnerUnderstand the difference between supervised and unsupervised learning. Learn when to use each approach with practical examples and decision frameworks.
9 min read