Skip to main content

Constitutional AI

Also known as: CAI, Constitutional Training, Self-Critique

In one sentence

A safety technique where an AI is trained using a set of principles (a 'constitution') to critique and revise its own outputs, making them more helpful, honest, and harmless without human feedback on every response.

Explain like I'm 12

The AI has a rulebook (constitution) like 'be helpful, be honest, don't be harmful.' It reads its own answer, checks if it follows the rules, and rewrites it if needed—all before showing you. It's like having a built-in editor.

In context

Example: Anthropic's Claude uses Constitutional AI with principles like 'choose the response that is most respectful' and 'avoid helping with illegal activities.' The AI generates responses, critiques them against these principles, and refines them automatically.

See also

Related Guides

Learn more about Constitutional AI in these guides: