Evaluation (Evals)
Also known as: Evals, Model Evaluation, Testing
In one sentence
Systematically testing an AI system to measure how well it performs on specific tasks or criteria.
Explain like I'm 12
Like giving AI a report card—running lots of tests to see if it gives good answers, stays safe, and does what you want.
In context
Evals might test accuracy (does it give correct answers?), safety (does it refuse harmful requests?), or consistency (does it always format responses correctly?).
See also
Related Guides
Learn more about Evaluation (Evals) in these guides:
What Are AI Evals? Understanding AI Evaluation
BeginnerLearn what AI evaluations (evals) are, why they matter, and how companies test AI systems to make sure they work correctly and safely.
7 min readEvaluations 201: Golden Sets, Rubrics, and Automated Eval
AdvancedBuild rigorous evaluation systems for AI. Create golden datasets, define rubrics, automate testing, and measure improvements.
14 min readA/B Testing AI Outputs: Measure What Works
IntermediateHow do you know if your AI changes improved outcomes? Learn to A/B test prompts, models, and parameters scientifically.
6 min read