Latency

Also known as: Response Time, Inference Time

In one sentence

How long it takes for an AI model to generate a response after you send a request.

Explain like I'm 12

The waiting time between when you ask AI a question and when it starts answering—like the delay before someone replies to your text.

In context

GPT-4 might have 2-5 second latency for complex prompts. Lower latency models respond faster but might be less capable. Streaming reduces perceived latency.

Related Guides

Learn more about Latency in these guides:

Cost & Latency: Making AI Fast and Affordable

Advanced

Optimize AI systems for speed and cost. Techniques for reducing latency, controlling API costs, and scaling efficiently.

13 min read

Deployment Patterns: Serverless, Edge, and Containers

Intermediate

How to deploy AI systems in production. Compare serverless, edge, container, and self-hosted options.

13 min read

Monitoring AI Systems in Production