Constitutional AI
Also known as: CAI, Constitutional Training, Self-Critique
In one sentence
A safety technique where an AI is trained using a set of principles (a 'constitution') to critique and revise its own outputs, making them more helpful, honest, and harmless without human feedback on every response.
Explain like I'm 12
The AI has a rulebook (constitution) like 'be helpful, be honest, don't be harmful.' It reads its own answer, checks if it follows the rules, and rewrites it if needed—all before showing you. It's like having a built-in editor.
In context
Example: Anthropic's Claude uses Constitutional AI with principles like 'choose the response that is most respectful' and 'avoid helping with illegal activities.' The AI generates responses, critiques them against these principles, and refines them automatically.
See also
Related Guides
Learn more about Constitutional AI in these guides:
Constitutional AI: Teaching Models to Self-Critique
AdvancedConstitutional AI trains models to follow principles, self-critique, and revise harmful outputs without human feedback on every example.
7 min readAI Safety and Alignment: Building Helpful, Harmless AI
IntermediateAI alignment ensures models do what we want them to do safely. Learn about RLHF, safety techniques, and responsible deployment.
7 min readGuardrails & Policy Design for AI
IntermediateDesign policies and guardrails to keep AI safe, compliant, and aligned with your values. Prevent harm, bias, and misuse.
14 min read