Constitutional AI
Also known as: CAI, Constitutional Training, Self-Critique
In one sentence
A safety technique where an AI is trained using a set of principles (a 'constitution') to critique and revise its own outputs, making them more helpful, honest, and harmless without human feedback on every response.
Explain like I'm 12
The AI has a rulebook—like a school code of conduct—that says 'be helpful, be honest, don't be harmful.' Before showing you its answer, it reads its own work, checks if it follows the rules, and rewrites anything that breaks them.
In context
Developed by Anthropic, Constitutional AI gives the model a written set of principles such as 'choose the response that is most respectful' and 'avoid helping with illegal activities.' During training, the AI generates responses, then critiques them against these principles and produces improved versions. This self-improvement cycle reduces the need for thousands of human reviewers to individually rate every response. Claude, Anthropic's AI assistant, uses Constitutional AI as a core part of its safety training.
See also
Related Guides
Learn more about Constitutional AI in these guides:
Constitutional AI: Teaching Models to Self-Critique
AdvancedConstitutional AI trains models to follow principles, self-critique, and revise harmful outputs without human feedback on every example.
7 min readAI Safety and Alignment: Building Helpful, Harmless AI
IntermediateAI alignment ensures models do what we want them to do safely. Learn about RLHF, safety techniques, and responsible deployment.
9 min readRLHF Explained: Training AI from Human Feedback
IntermediateUnderstand Reinforcement Learning from Human Feedback. How modern AI systems learn from human preferences to become more helpful, harmless, and honest.
9 min read