Skip to main content

Evaluation (Evals)

Also known as: Evals, Model Evaluation, Testing

In one sentence

Systematically testing an AI system to measure how well it performs on specific tasks or criteria.

Explain like I'm 12

Like giving AI a report card—running lots of tests to see if it gives good answers, stays safe, and does what you want.

In context

Evals might test accuracy (does it give correct answers?), safety (does it refuse harmful requests?), or consistency (does it always format responses correctly?).

See also

Related Guides

Learn more about Evaluation (Evals) in these guides: