Skip to main content

Latency

Also known as: Response Time, Inference Time

In one sentence

How long it takes for an AI model to generate a response after you send a request.

Explain like I'm 12

The waiting time between when you ask AI a question and when it starts answering—like the delay before someone replies to your text.

In context

GPT-4 might have 2-5 second latency for complex prompts. Lower latency models respond faster but might be less capable. Streaming reduces perceived latency.

See also

Related Guides

Learn more about Latency in these guides: