What is Transformer?
The neural network architecture that powers modern AI language models.
Definition
The Transformer is a neural network architecture introduced in 2017 that revolutionized natural language processing. It uses a mechanism called "attention" that allows the model to weigh the importance of different parts of the input when generating each output token. Almost all modern LLMs โ GPT, Claude, Gemini, Llama โ are built on transformer architecture.
๐ก Example
When a transformer model reads "The cat sat on the ___", the attention mechanism helps it focus on "cat" and "sat" to predict the next word is likely "mat" or "chair", rather than being distracted by less relevant words.
Related concepts
What is Transformer?
The Transformer is a neural network architecture introduced in 2017 that revolutionized natural language processing. It uses a mechanism called "attention" that allows the model to weigh the importance of different parts of the input when generating each output token. Almost all modern LLMs โ GPT, Claude, Gemini, Llama โ are built on transformer architecture.
How does Transformer work in practice?
When a transformer model reads "The cat sat on the ___", the attention mechanism helps it focus on "cat" and "sat" to predict the next word is likely "mat" or "chair", rather than being distracted by less relevant words.