What is Attention Mechanism?
The core technique that allows transformers to focus on relevant parts of the input.
Definition
The attention mechanism allows a neural network to weigh the importance of different parts of the input when processing each element. In transformers, "self-attention" lets every token in a sequence attend to every other token, capturing long-range dependencies. Multi-head attention runs several attention computations in parallel for richer representations.
๐ก Example
In the sentence "The cat sat on the mat because it was tired," attention helps the model understand that "it" refers to "the cat" by assigning high attention weights between those tokens, even though they are far apart.
Related concepts
A type of AI trained on massive text datasets to understand and generate human language.
The basic unit of text that AI models process โ roughly 4 characters or 0.75 words.
The neural network architecture that powers modern AI language models.
Explore AI tools
Find tools that use attention mechanism in practice.
What is Attention Mechanism?
The attention mechanism allows a neural network to weigh the importance of different parts of the input when processing each element. In transformers, "self-attention" lets every token in a sequence attend to every other token, capturing long-range dependencies. Multi-head attention runs several attention computations in parallel for richer representations.
How does Attention Mechanism work in practice?
In the sentence "The cat sat on the mat because it was tired," attention helps the model understand that "it" refers to "the cat" by assigning high attention weights between those tokens, even though they are far apart.