Skip to main content
BETAThis is a new design — give feedback

Transformer

Also known as: Transformer Architecture, Transformer Model

In one sentence

A neural network architecture that revolutionised AI by using attention mechanisms to understand relationships between all words in a text simultaneously, enabling modern LLMs like GPT and Claude.

Explain like I'm 12

Older AI read sentences one word at a time, like reading with a magnifying glass. Transformers can see the whole page at once and understand how every word connects to every other word — that's why they're so much smarter.

In context

Introduced in Google's landmark 2017 paper 'Attention Is All You Need', the transformer architecture is the foundation of virtually every modern language model including GPT-4, Claude, Gemini, and Llama. The key innovation is the 'self-attention' mechanism, which lets the model weigh how important each word is relative to every other word in the input. This parallel processing also makes transformers much faster to train on GPUs compared to older sequential architectures like RNNs.

See also

Related Guides

Learn more about Transformer in these guides: