Definition

LoRA (Low-Rank Adaptation)

A fine-tuning technique that trains a small set of adapter weights instead of the full model — making fine-tuning dramatically cheaper and faster.

The Full Definition

LoRA — Low-Rank Adaptation — is a fine-tuning technique that freezes the original model weights and trains a small number of additional adapter weights instead. Because the adapter is tiny (often <1% the size of the full model), LoRA fine-tuning runs on modest GPU hardware in hours instead of days, costs hundreds of dollars instead of tens of thousands, and produces an adapter that can be swapped in or out without retraining. QLoRA combines LoRA with quantization to make this even more accessible — fine-tuning a 70B-parameter model on a single GPU is now routine.

Why It Matters

LoRA collapsed the cost of fine-tuning by 100x. It's the technique that turned fine-tuning from a research-lab activity into something a competent engineering team can do as part of a normal product cycle.

How This Shows Up in Practice

A legal team fine-tunes a 13B-parameter model on 3,000 of its house-style memos using LoRA. Total compute cost: about $80. Total turnaround: 6 hours. The resulting adapter is loaded alongside the base model at inference time — and the team can re-train monthly as the corpus grows.

Common Questions

Is LoRA worse than full fine-tuning?

For most practical tasks, the quality gap is negligible — and LoRA's flexibility and cost advantages typically outweigh any small quality difference. Full fine-tuning is rarely worth it in production.

Can I stack multiple LoRAs?

Yes — multiple adapters trained for different tasks or styles can be loaded together, and modern inference engines support hot-swapping adapters per request.

Related Terms

Want to put this to work?

A free process audit maps where lora (low-rank adaptation) — and the rest of the modern AI stack — actually move the needle in your business.

Survey My Business