Evaluation (Evals)
Also known as: Evals, Model Evaluation, Testing
In one sentence
Systematically testing an AI system to measure how well it performs on specific tasks or criteria.
Explain like I'm 12
Like giving AI a report card—running lots of tests to see if it gives good answers, stays safe, and does what you want.
In context
Evals might test accuracy (does it give correct answers?), safety (does it refuse harmful requests?), or consistency (does it always format responses correctly?).
See also
Related Guides
Learn more about Evaluation (Evals) in these guides:
What Are AI Evals? Understanding AI Evaluation
BeginnerLearn what AI evaluations (evals) are, why they matter, and how companies test AI systems to make sure they work correctly and safely.
7 min readAI Safety Testing Basics: Finding Problems Before Users Do
IntermediateLearn how to test AI systems for safety issues. From prompt injection to bias detection—practical testing approaches that help catch problems before deployment.
10 min readEvaluations 201: Golden Sets, Rubrics, and Automated Eval
AdvancedBuild rigorous evaluation systems for AI. Create golden datasets, define rubrics, automate testing, and measure improvements.
14 min read