Evaluation (Evals)

Also known as: Evals, Model Evaluation, Testing

In one sentence

Systematically testing an AI system to measure how well it performs on specific tasks or criteria.

Explain like I'm 12

Like giving AI a report card—running lots of tests to see if it gives good answers, stays safe, and does what you want.

In context

Evals might test accuracy (does it give correct answers?), safety (does it refuse harmful requests?), or consistency (does it always format responses correctly?).

Related Guides

Learn more about Evaluation (Evals) in these guides:

What Are AI Evals? Understanding AI Evaluation

Beginner

Learn what AI evaluations (evals) are, why they matter, and how companies test AI systems to make sure they work correctly and safely.

7 min read

AI Safety Testing Basics: Finding Problems Before Users Do

Intermediate

Learn how to test AI systems for safety issues. From prompt injection to bias detection—practical testing approaches that help catch problems before deployment.

10 min read

Evaluations 201: Golden Sets, Rubrics, and Automated Eval

Advanced

Build rigorous evaluation systems for AI. Create golden datasets, define rubrics, automate testing, and measure improvements.

14 min read