Question 1

What is Mixture of Experts (MoE)?

Accepted Answer

Mixture of Experts (MoE) is a neural network architecture where the model contains multiple "expert" sub-networks, but only activates a small subset for each input token. A routing network decides which experts are most relevant. This allows models to have more total parameters (more knowledge) while using less computation per request.

Question 2

How does Mixture of Experts (MoE) work?

Accepted Answer

GPT-4 is rumored to use MoE with 8 expert networks, activating 2 per token. This means it has the knowledge of a much larger model but runs at the speed of a smaller one. Mixtral by Mistral is an explicitly MoE model.

What is Mixture of Experts (MoE)?

Definition

💡 Example

Related concepts

Explore AI tools