Latency
Also known as: Response Time, Inference Time
In one sentence
How long it takes for an AI model to generate a response after you send a request.
Explain like I'm 12
The waiting time between when you ask AI a question and when it starts answering—like the delay before someone replies to your text.
In context
GPT-4 might have 2-5 second latency for complex prompts. Lower latency models respond faster but might be less capable. Streaming reduces perceived latency.
See also
Related Guides
Learn more about Latency in these guides:
AI Latency Optimization: Making AI Faster
IntermediateLearn to reduce AI response times. From model optimization to infrastructure tuning—practical techniques for building faster AI applications.
10 min readCost & Latency: Making AI Fast and Affordable
AdvancedOptimize AI systems for speed and cost. Techniques for reducing latency, controlling API costs, and scaling efficiently.
13 min readDeployment Patterns: Serverless, Edge, and Containers
IntermediateHow to deploy AI systems in production. Compare serverless, edge, container, and self-hosted options.
13 min read