Skip to main content
BETAThis is a new design — give feedback

Constitutional AI

Also known as: CAI, Constitutional Training, Self-Critique

In one sentence

A safety technique where an AI is trained using a set of principles (a 'constitution') to critique and revise its own outputs, making them more helpful, honest, and harmless without human feedback on every response.

Explain like I'm 12

The AI has a rulebook—like a school code of conduct—that says 'be helpful, be honest, don't be harmful.' Before showing you its answer, it reads its own work, checks if it follows the rules, and rewrites anything that breaks them.

In context

Developed by Anthropic, Constitutional AI gives the model a written set of principles such as 'choose the response that is most respectful' and 'avoid helping with illegal activities.' During training, the AI generates responses, then critiques them against these principles and produces improved versions. This self-improvement cycle reduces the need for thousands of human reviewers to individually rate every response. Claude, Anthropic's AI assistant, uses Constitutional AI as a core part of its safety training.

See also

Related Guides

Learn more about Constitutional AI in these guides: