Latency
Also known as: Response Time, Inference Time
In one sentence
How long it takes for an AI model to generate a response after you send a request.
Explain like I'm 12
The waiting time between when you ask AI a question and when it starts answering—like the delay before someone replies to your text.
In context
GPT-4 might have 2-5 second latency for complex prompts. Lower latency models respond faster but might be less capable. Streaming reduces perceived latency.
See also
Related Guides
Learn more about Latency in these guides:
Cost & Latency: Making AI Fast and Affordable
AdvancedOptimize AI systems for speed and cost. Techniques for reducing latency, controlling API costs, and scaling efficiently.
13 min readDeployment Patterns: Serverless, Edge, and Containers
IntermediateHow to deploy AI systems in production. Compare serverless, edge, container, and self-hosted options.
13 min readMonitoring AI Systems in Production
AdvancedEnterprise-grade monitoring, alerting, and observability for production AI systems. Learn to track performance, costs, quality, and security at scale.
20 min read