Inference
Also known as: Model Inference, Prediction
In one sentence
When a trained AI model processes new input and generates a prediction or response—the 'using' phase after training is done.
Explain like I'm 12
After you've trained AI by showing it examples, inference is when you actually use it—like asking a question and getting an answer.
In context
Every time you chat with ChatGPT, you're triggering inference. The model uses what it learned during training to generate responses to your prompts.
See also
Related Guides
Learn more about Inference in these guides:
Efficient Inference Optimization
AdvancedOptimize AI inference for speed and cost: batching, caching, model serving, KV cache, speculative decoding, and more.
8 min readDeployment Patterns: Serverless, Edge, and Containers
IntermediateHow to deploy AI systems in production. Compare serverless, edge, container, and self-hosted options.
13 min readModel Compression: Smaller, Faster AI
AdvancedCompress AI models with quantization, pruning, and distillation. Deploy faster, cheaper models without sacrificing much accuracy.
7 min read