Architecture

What is Transformer?

The neural network architecture that powers modern AI language models.

Definition

The Transformer is a neural network architecture introduced in 2017 that revolutionized natural language processing. It uses a mechanism called "attention" that allows the model to weigh the importance of different parts of the input when generating each output token. Almost all modern LLMs โ€” GPT, Claude, Gemini, Llama โ€” are built on transformer architecture.

๐Ÿ’ก Example

When a transformer model reads "The cat sat on the ___", the attention mechanism helps it focus on "cat" and "sat" to predict the next word is likely "mat" or "chair", rather than being distracted by less relevant words.

Related concepts

LLM (Large Language Model)

A type of AI trained on massive text datasets to understand and generate human language.

โ†’
GPT (Generative Pre-trained Transformer)

OpenAI's family of language models that power ChatGPT.

โ†’

Explore AI tools

Find tools that use transformer in practice.

Browse all tools โ†’ Back to glossary
What is Transformer?

The Transformer is a neural network architecture introduced in 2017 that revolutionized natural language processing. It uses a mechanism called "attention" that allows the model to weigh the importance of different parts of the input when generating each output token. Almost all modern LLMs โ€” GPT, Claude, Gemini, Llama โ€” are built on transformer architecture.

How does Transformer work in practice?

When a transformer model reads "The cat sat on the ___", the attention mechanism helps it focus on "cat" and "sat" to predict the next word is likely "mat" or "chair", rather than being distracted by less relevant words.