Transformer
Also known as: Transformer Architecture, Transformer Model
In one sentence
A neural network architecture that revolutionised AI by using attention mechanisms to understand relationships between all words in a text simultaneously, enabling modern LLMs like GPT and Claude.
Explain like I'm 12
Older AI read sentences one word at a time, like reading with a magnifying glass. Transformers can see the whole page at once and understand how every word connects to every other word — that's why they're so much smarter.
In context
Introduced in Google's landmark 2017 paper 'Attention Is All You Need', the transformer architecture is the foundation of virtually every modern language model including GPT-4, Claude, Gemini, and Llama. The key innovation is the 'self-attention' mechanism, which lets the model weigh how important each word is relative to every other word in the input. This parallel processing also makes transformers much faster to train on GPUs compared to older sequential architectures like RNNs.
See also
Related Guides
Learn more about Transformer in these guides:
AI Model Architectures: A High-Level Overview
IntermediateFrom transformers to CNNs to diffusion models—understand the different AI architectures and what they're good at.
7 min readNatural Language Processing: How AI Understands Text
IntermediateNLP is how AI reads, understands, and generates human language. Learn the techniques behind chatbots, translation, and text analysis.
8 min readDesigning Custom AI Architectures
AdvancedDesign specialized AI architectures for unique problems. When and how to go beyond pre-trained models and build custom solutions.
7 min read