What is Mixture of Experts (MoE)?
An architecture that activates only a subset of model parameters for each input, improving efficiency.
Definition
Mixture of Experts (MoE) is a neural network architecture where the model contains multiple "expert" sub-networks, but only activates a small subset for each input token. A routing network decides which experts are most relevant. This allows models to have more total parameters (more knowledge) while using less computation per request.
๐ก Example
GPT-4 is rumored to use MoE with 8 expert networks, activating 2 per token. This means it has the knowledge of a much larger model but runs at the speed of a smaller one. Mixtral by Mistral is an explicitly MoE model.
Related concepts
Explore AI tools
Find tools that use mixture of experts (moe) in practice.
What is Mixture of Experts (MoE)?
Mixture of Experts (MoE) is a neural network architecture where the model contains multiple "expert" sub-networks, but only activates a small subset for each input token. A routing network decides which experts are most relevant. This allows models to have more total parameters (more knowledge) while using less computation per request.
How does Mixture of Experts (MoE) work in practice?
GPT-4 is rumored to use MoE with 8 expert networks, activating 2 per token. This means it has the knowledge of a much larger model but runs at the speed of a smaller one. Mixtral by Mistral is an explicitly MoE model.