wikis / Unsloth / wiki / concepts / lora-and-hyperparameters.md view as markdown report a mistake

LoRA, QLoRA & Hyperparameters

type: conceptconfidence: highupdated: 2026-06-19sources: 2

LoRA and QLoRA

LoRA (Low-Rank Adaptation) trains small adapter matrices injected into the model instead of all weights — tiny memory footprint, fast, and the adapter is a few MB you can swap/share.
QLoRA = LoRA on a 4-bit quantized base model — even less VRAM (the standard way to fine-tune big models on small GPUs), with Unsloth keeping accuracy high.

LoRA rank (r) — adapter capacity; higher = more expressive but more memory/overfit risk. Common 8–64; 16/32 typical.
LoRA alpha — scaling; a common heuristic is alpha = r or 2×r.
target_modules — which projections get adapters (attention q/k/v/o + MLP gate/up/down); targeting all linear layers is the strong default.
Learning rate — e.g. ~2e-4 for LoRA (higher than full FT); too high destabilizes.
Epochs — usually 1–3; more risks overfitting on small data.
Batch size & gradient accumulation — effective batch size = batch_size × grad_accum × #GPUs; raise grad-accum to simulate a big batch within VRAM limits. It directly affects training stability.
Sequence length — set to your data; Unsloth enables long context efficiently.

Start from an Unsloth notebook's defaults; change rank/LR/epochs only as needed.
Watch eval loss for overfitting; reduce epochs/rank or add data if it diverges from train loss.
QLoRA first (fits more); move to LoRA/full FT if you have VRAM and need maximum quality.

Adapters can be hot-swapped at inference (multiple LoRAs on one base — see docs-catalog). After training, save/export.