Skip to main content

RLHF (Reinforcement Learning from Human Feedback)

Also known as: Reinforcement Learning from Human Feedback, RLHF, Human Feedback Training

In one sentence

A training method where humans rate AI outputs to teach the model which responses are helpful, harmless, and accurate.

Explain like I'm 12

Training AI by having real people give it grades on its answers—like a teacher marking homework—so it learns what good responses look like.

In context

Used to train ChatGPT, Claude, and other assistants to be more helpful and safe. Humans compare outputs and choose better ones, teaching the AI preferences.

See also

Related Guides

Learn more about RLHF (Reinforcement Learning from Human Feedback) in these guides: