Skip to main content

Training Data

Also known as: Training Set, Training Dataset, Training Corpus

In one sentence

The collection of examples an AI system learns from. The quality, quantity, and diversity of training data directly determines what the AI can and cannot do.

Explain like I'm 12

If the AI is a student, training data is all the textbooks and examples it studies. Give it bad textbooks, and it learns the wrong things. Give it great ones, and it becomes really smart at that subject.

In context

ChatGPT was trained on billions of web pages, books, and articles—that's its training data. Image generators like DALL-E learned from millions of image-caption pairs. A company building a customer service bot would use past support tickets as training data.

See also

Related Guides

Learn more about Training Data in these guides: