Architecture

What is Mixture of Experts (MoE)?

An architecture that activates only a subset of model parameters for each input, improving efficiency.

Definition

Mixture of Experts (MoE) is a neural network architecture where the model contains multiple "expert" sub-networks, but only activates a small subset for each input token. A routing network decides which experts are most relevant. This allows models to have more total parameters (more knowledge) while using less computation per request.

๐Ÿ’ก Example

GPT-4 is rumored to use MoE with 8 expert networks, activating 2 per token. This means it has the knowledge of a much larger model but runs at the speed of a smaller one. Mixtral by Mistral is an explicitly MoE model.

Related concepts

LLM (Large Language Model)

A type of AI trained on massive text datasets to understand and generate human language.

โ†’
Transformer

The neural network architecture that powers modern AI language models.

โ†’
Inference

The process of running a trained AI model to generate predictions or outputs.

โ†’

Explore AI tools

Find tools that use mixture of experts (moe) in practice.

Browse all tools โ†’ Back to glossary
What is Mixture of Experts (MoE)?

Mixture of Experts (MoE) is a neural network architecture where the model contains multiple "expert" sub-networks, but only activates a small subset for each input token. A routing network decides which experts are most relevant. This allows models to have more total parameters (more knowledge) while using less computation per request.

How does Mixture of Experts (MoE) work in practice?

GPT-4 is rumored to use MoE with 8 expert networks, activating 2 per token. This means it has the knowledge of a much larger model but runs at the speed of a smaller one. Mixtral by Mistral is an explicitly MoE model.