Inference

Also known as: Model Inference, Prediction

In one sentence

When a trained AI model processes new input and generates a prediction or response—the 'using' phase after training is done.

After you've trained AI by showing it examples, inference is when you actually use it—like asking a question and getting an answer.

Every time you chat with ChatGPT, you're triggering inference. The model uses what it learned during training to generate responses to your prompts.

Learn more about Inference in these guides:

Optimize AI inference for speed and cost: batching, caching, model serving, KV cache, speculative decoding, and more.

How to deploy AI systems in production. Compare serverless, edge, container, and self-hosted options.

Compress AI models with quantization, pruning, and distillation. Deploy faster, cheaper models without sacrificing much accuracy.