wikis / Unsloth / wiki / concepts / lora-and-hyperparameters.md view as markdown report a mistake
LoRA, QLoRA & Hyperparameters
LoRA and QLoRA
- LoRA (Low-Rank Adaptation) trains small adapter matrices injected into the model instead of all weights — tiny memory footprint, fast, and the adapter is a few MB you can swap/share.
- QLoRA = LoRA on a 4-bit quantized base model — even less VRAM (the standard way to fine-tune big models on small GPUs), with Unsloth keeping accuracy high.
Key hyperparameters
- LoRA rank (
r) — adapter capacity; higher = more expressive but more memory/overfit risk. Common 8–64; 16/32 typical. - LoRA alpha — scaling; a common heuristic is
alpha = ror2×r. - target_modules — which projections get adapters (attention q/k/v/o + MLP gate/up/down); targeting all linear layers is the strong default.
- Learning rate — e.g. ~2e-4 for LoRA (higher than full FT); too high destabilizes.
- Epochs — usually 1–3; more risks overfitting on small data.
- Batch size & gradient accumulation — effective batch size =
batch_size × grad_accum × #GPUs; raise grad-accum to simulate a big batch within VRAM limits. It directly affects training stability. - Sequence length — set to your data; Unsloth enables long context efficiently.
Practical guidance
- Start from an Unsloth notebook's defaults; change rank/LR/epochs only as needed.
- Watch eval loss for overfitting; reduce epochs/rank or add data if it diverges from train loss.
- QLoRA first (fits more); move to LoRA/full FT if you have VRAM and need maximum quality.
Adapters can be hot-swapped at inference (multiple LoRAs on one base — see docs-catalog). After training, save/export.
