Skip to content
Techniques

What is LoRA (Low-Rank Adaptation)?

Last updated May 2026

An efficient method to fine-tune AI models using much less compute and memory.

Definition

LoRA (Low-Rank Adaptation) is a technique for efficiently fine-tuning large AI models by only modifying a small number of additional parameters rather than updating the entire model. This dramatically reduces the computational cost, memory requirements, and training time needed for customization while achieving results comparable to full fine-tuning.

💡 Example

Instead of fine-tuning all 70 billion parameters of Llama 3, LoRA might only train 10 million additional parameters (0.01%) to adapt the model for medical diagnosis. This can be done on a single GPU instead of requiring a cluster.

Related concepts

LLM (Large Language Model)

A type of AI trained on massive text datasets to understand and generate human language.

Why this matters

LoRA makes fine-tuning affordable. Instead of retraining an entire model (expensive), LoRA trains a small adapter layer (cheap). This democratized custom AI — even small teams can now create specialized models without massive GPU budgets.

Real-world example

Fine-tuning Llama 3.1 70B normally requires 8+ high-end GPUs. With LoRA, you can fine-tune it on a single GPU in hours. The resulting model performs nearly as well as full fine-tuning for most tasks. Hugging Face and Ollama both support LoRA adapters.

Fine-Tuning

Training a pre-trained AI model on specialized data to improve performance on specific tasks.

Explore AI tools

Find tools that use lora (low-rank adaptation) in practice.

Browse all tools → Back to glossary
What is LoRA (Low-Rank Adaptation)?

LoRA (Low-Rank Adaptation) is a technique for efficiently fine-tuning large AI models by only modifying a small number of additional parameters rather than updating the entire model. This dramatically reduces the computational cost, memory requirements, and training time needed for customization while achieving results comparable to full fine-tuning.

How does LoRA (Low-Rank Adaptation) work in practice?

Instead of fine-tuning all 70 billion parameters of Llama 3, LoRA might only train 10 million additional parameters (0.01%) to adapt the model for medical diagnosis. This can be done on a single GPU instead of requiring a cluster.

What are the advantages of LoRA over full fine-tuning?

LoRA requires far less GPU memory and compute than full fine-tuning because it only trains small adapter matrices rather than all model parameters. This makes it practical to fine-tune large models on consumer hardware and to maintain multiple specialized adapters that can be swapped in and out.

What are common use cases for LoRA?

LoRA is widely used for creating custom image generation styles in Stable Diffusion, adapting language models to specific domains or writing styles, and building specialized chatbots. It is especially popular in the open-source AI community where users fine-tune models on personal hardware.

Can you use multiple LoRA adapters at once?

Yes, multiple LoRA adapters can be combined or swapped on a single base model. For example, in image generation, you might combine a style LoRA with a subject LoRA. This modularity is a key advantage, as you can mix and match capabilities without retraining the full model.