Skip to main content
BETAThis is a new design — give feedback

AI Alignment

Getting AI to do what we actually want, not just what we literally ask, is one of the biggest challenges in the field. These guides explore AI alignment, the discipline of making sure AI systems reliably follow human intentions, values, and goals. You will learn about core techniques like reinforcement learning from human feedback and constitutional AI that shape how modern chatbots and assistants behave. The topic also covers the alignment problem at a deeper level, including reward hacking, goal misspecification, and the difficulty of encoding complex human values into mathematical objectives. You will understand why alignment matters not just for researchers building frontier models, but for anyone deploying AI in high-stakes settings like healthcare, finance, or hiring. Whether you are evaluating the safety properties of an AI tool, setting organisational guardrails, or simply trying to understand why AI sometimes behaves in surprising ways, these guides give you the essential background to think clearly about alignment.