Architecture

What is Attention Mechanism?

The core technique that allows transformers to focus on relevant parts of the input.

Definition

The attention mechanism allows a neural network to weigh the importance of different parts of the input when processing each element. In transformers, "self-attention" lets every token in a sequence attend to every other token, capturing long-range dependencies. Multi-head attention runs several attention computations in parallel for richer representations.

๐Ÿ’ก Example

In the sentence "The cat sat on the mat because it was tired," attention helps the model understand that "it" refers to "the cat" by assigning high attention weights between those tokens, even though they are far apart.

Related concepts

LLM (Large Language Model)

A type of AI trained on massive text datasets to understand and generate human language.

โ†’
Token

The basic unit of text that AI models process โ€” roughly 4 characters or 0.75 words.

โ†’
Transformer

The neural network architecture that powers modern AI language models.

โ†’

Explore AI tools

Find tools that use attention mechanism in practice.

Browse all tools โ†’ Back to glossary
What is Attention Mechanism?

The attention mechanism allows a neural network to weigh the importance of different parts of the input when processing each element. In transformers, "self-attention" lets every token in a sequence attend to every other token, capturing long-range dependencies. Multi-head attention runs several attention computations in parallel for richer representations.

How does Attention Mechanism work in practice?

In the sentence "The cat sat on the mat because it was tired," attention helps the model understand that "it" refers to "the cat" by assigning high attention weights between those tokens, even though they are far apart.