# Unsloth — full corpus


<!-- ===== unsloth/README.md ===== -->

# LLM Wiki

An open-source template for building LLM-powered knowledge bases, following [Andrej Karpathy's "LLM Wiki" pattern](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f).

You provide raw sources. The LLM reads them, writes structured wiki pages, cross-links everything, and maintains it over time. You never edit the wiki directly — you curate sources and ask questions.

## How It Works

The system has three layers:

```
raw/              Sources you collect (articles, transcripts, notes, PDFs)
wiki/             LLM-written & maintained pages (summaries, concepts, entities, syntheses)
CLAUDE.md         Schema that tells the LLM how to structure everything
```

Three operations drive the workflow:

| Operation | Trigger | What happens |
|-----------|---------|--------------|
| **Ingest** | "ingest raw/my-source.txt" | LLM reads the source, creates a summary page, creates/updates concept and entity pages, adds cross-links, updates the index and log |
| **Query** | Ask any question | LLM searches the wiki, synthesizes an answer with citations, optionally creates a synthesis page for novel insights |
| **Lint** | "lint" or "health check" | LLM audits all pages for orphans, contradictions, missing links, incomplete sections, and low-confidence claims — fixes what it can, reports the rest |

## Quick Start

1. **Clone this repo**
   ```bash
   git clone https://github.com/YOUR_USERNAME/llm-wiki.git my-knowledge-base
   cd my-knowledge-base
   ```

2. **Customize CLAUDE.md** for your domain
   - Update the Purpose section with your topic
   - Replace the placeholder tagging taxonomy with your own categories
   - Adjust confidence level descriptions if needed
   - Everything else (workflows, page formats, linking rules) works as-is

3. **Drop sources into `raw/`**
   - Text files, transcripts, articles, notes — any plain text
   - These are immutable once added; the LLM never modifies them

4. **Tell the LLM to ingest**
   ```
   ingest raw/my-first-source.txt
   ```
   The LLM will create summary pages, concept pages, entity pages, cross-links, and update the index.

5. **Ask questions**
   ```
   What are the key differences between X and Y?
   ```
   The LLM answers from the wiki, citing specific pages.

6. **Run health checks**
   ```
   lint
   ```
   The LLM audits the wiki and fixes issues.

## Directory Structure

```
.
├── CLAUDE.md                      # Schema — the LLM's instructions
├── raw/                           # Your source documents (immutable)
└── wiki/
    ├── index.md                   # Master catalog of all pages
    ├── log.md                     # Append-only activity log
    ├── dashboard.md               # Dataview dashboard (Obsidian)
    ├── analytics.md               # Charts View analytics (Obsidian)
    ├── flashcards.md              # Spaced repetition cards
    ├── summaries/                 # One page per source document
    ├── concepts/                  # Concept and framework pages
    ├── entities/                  # People, tools, organizations, etc.
    ├── syntheses/                 # Cross-cutting analyses and comparisons
    ├── journal/                   # Research/session journal entries
    │   └── template.md            # Journal entry template
    └── presentations/             # Marp slide decks
```

## Enhancements

This template includes several extras beyond the core wiki pattern:

### Dataview Dashboard (`wiki/dashboard.md`)
Live queries that surface low-confidence pages, recent updates, concepts by tag, and pages with the most sources. Requires the [Dataview](https://github.com/blacksmithgu/obsidian-dataview) Obsidian plugin.

### Charts View Analytics (`wiki/analytics.md`)
Visual analytics with pie charts, bar charts, and word clouds. Requires the [Charts View](https://github.com/caronchen/obsidian-chartsview-plugin) Obsidian plugin.

### Mermaid Diagrams
Use Mermaid code blocks in any wiki page to create flowcharts, sequence diagrams, or concept maps. Native support in Obsidian and GitHub.

### Marp Slides (`wiki/presentations/`)
Create slide decks from markdown using [Marp](https://marp.app/). Drop presentation files in this directory.

### Research Journal (`wiki/journal/`)
Track your research sessions, experiments, or applied work with the included template. The LLM can reference journal entries when answering queries.

### Spaced Repetition (`wiki/flashcards.md`)
Flashcards in the format used by the [Spaced Repetition](https://github.com/st3v3nmw/obsidian-spaced-repetition) Obsidian plugin. Ask the LLM to generate flashcards from any wiki page.

### MCP Server
This repo works with Claude Code's MCP server capabilities. Point an MCP-compatible client at this repo and the LLM can read/write the wiki programmatically.

## Customizing for Your Domain

The schema in `CLAUDE.md` is domain-agnostic. To adapt it:

1. **Purpose** — Describe your knowledge domain in one paragraph
2. **Tagging taxonomy** — Replace placeholder categories with your own (e.g., for a cooking KB: `cuisine`, `technique`, `ingredient`, `equipment`)
3. **Confidence levels** — Adjust the descriptions to match your domain's evidence standards
4. **Entity types** — Update the entity page description to match what entities mean in your domain (people, tools, companies, etc.)
5. **Journal template** — Customize `wiki/journal/template.md` for your workflow

Everything else — page format, linking conventions, workflows, rules — is universal and works across domains.

## Example Domains

This template works for any knowledge-intensive topic:

- **Research notes** — papers, experiments, methodologies
- **Book analysis** — themes, characters, author techniques
- **Competitive analysis** — companies, products, market trends
- **Course notes** — lectures, readings, key concepts
- **Personal development** — frameworks, habits, book summaries
- **Technical documentation** — APIs, architectures, design patterns
- **Hobby deep-dives** — any subject you want to master

## License

MIT


<!-- ===== unsloth/wiki/index.md ===== -->

---
title: "Knowledge Base Index"
type: index
updated: 2026-06-19
---

# Knowledge Base Index

Master catalog of all wiki pages. Every page in the wiki must have an entry here.

## Concepts

| Page | Tags | Confidence | Updated |
|------|------|------------|---------|
| [what-is-unsloth](concepts/what-is-unsloth.md) | unsloth, fine-tuning, overview | high | 2026-06-19 |
| [fine-tuning-basics](concepts/fine-tuning-basics.md) | fine-tuning, rag, when-to-use | high | 2026-06-19 |
| [installation](concepts/installation.md) | installation, pip, requirements | high | 2026-06-19 |
| [datasets](concepts/datasets.md) | datasets, chat-template, data-prep | high | 2026-06-19 |
| [lora-and-hyperparameters](concepts/lora-and-hyperparameters.md) | lora, qlora, hyperparameters | high | 2026-06-19 |
| [reinforcement-learning](concepts/reinforcement-learning.md) | rl, grpo, reasoning | high | 2026-06-19 |
| [saving-and-exporting](concepts/saving-and-exporting.md) | saving, gguf, ollama, export | high | 2026-06-19 |
| [inference-and-deployment](concepts/inference-and-deployment.md) | inference, vllm, lm-studio | high | 2026-06-19 |
| [unsloth-studio](concepts/unsloth-studio.md) | unsloth-studio, web-ui | medium | 2026-06-19 |

## Entities

| Page | Tags | Updated |
|------|------|---------|
| [unsloth-library](entities/unsloth-library.md) | library, fastlanguagemodel, trl | 2026-06-19 |

## Summaries

| Page | Source | Key Topics | Created |
|------|--------|------------|---------|
| [model-catalog-and-notebooks](summaries/model-catalog-and-notebooks.md) | model/notebook index | supported models, notebooks | 2026-06-19 |
| [docs-catalog](summaries/docs-catalog.md) | docs.unsloth.ai llms.txt | area map | 2026-06-19 |

## Syntheses

| Page | Pages Compared | Created |
|------|----------------|---------|
| [end-to-end-fine-tune](syntheses/end-to-end-fine-tune.md) | full workflow + pitfalls | 2026-06-19 |

## Statistics

- **Total pages**: 13
- **Concepts**: 9
- **Entities**: 1
- **Summaries**: 2
- **Syntheses**: 1
- **Sources ingested**: 131 (docs.unsloth.ai llms.txt: 1 index + 130 pages; many per-model guides catalogued)
- **High confidence**: 12
- **Medium confidence**: 1


<!-- ===== unsloth/wiki/concepts/datasets.md ===== -->

---
title: "Datasets & Chat Templates"
type: concept
tags: [datasets, chat-template, formatting, data-prep]
updated: 2026-06-19
confidence: high
sources: [raw/llms_txt_doc-datasets-guide.md, raw/llms_txt_doc-fine-tuning-llms-guide.md]
---

# Datasets & Chat Templates

Data quality is the biggest lever in fine-tuning — Unsloth's datasets guide covers preparing data for SFT and [RL](reinforcement-learning.md).

## Dataset formats

- **Instruction / chat format** — the common case: examples as messages (`system`/`user`/`assistant`) or instruction–input–output triples, rendered through the model's **chat template** so training matches how the model is prompted at inference.
- **Completion / raw text** — continued-pretraining-style raw text.
- **Preference data** — for RL/DPO-style methods (prompt + chosen/rejected).

## Chat templates

Each model family has a specific **chat template** (special tokens, role markers). Unsloth provides helpers to apply the correct template so your formatted data exactly matches the model's expected structure — a frequent source of bad results when mismatched. Train on the **assistant turns** (mask the prompt) so the model learns to *produce* responses, not echo prompts.

## Handling missing/empty fields

The guide shows a neat technique: wrap optional columns in `[[ ]]` so rows with **empty values** drop that text entirely rather than emitting "EMPTY" — keeping prompts clean across heterogeneous rows.

## Practical tips

- **Quality > quantity** — a few hundred to a few thousand clean, on-task examples often beat huge noisy sets.
- **Match inference format** — format training data exactly as you'll prompt the model.
- **Hold out** a small eval set to check for overfitting.
- Standard sources: Hugging Face datasets, your own JSON/CSV. Then set [hyperparameters](lora-and-hyperparameters.md) and train.


<!-- ===== unsloth/wiki/concepts/fine-tuning-basics.md ===== -->

---
title: "Fine-tuning Basics: Is It Right for You?"
type: concept
tags: [fine-tuning, basics, rag, lora, when-to-use]
updated: 2026-06-19
confidence: high
sources: [raw/llms_txt_doc-fine-tuning-for-beginners.md, raw/llms_txt_doc-faq-is-fine-tuning-right-for-me.md, raw/llms_txt_doc-what-model-should-i-use-for-fine-tuning.md]
---

# Fine-tuning Basics: Is It Right for You?

**Fine-tuning** adapts a pretrained model to your data/task — teaching it a style, domain, format, or behavior it doesn't have out of the box.

## Fine-tuning vs RAG (a common misconception)

They solve different problems and often combine:

- **RAG** injects *knowledge* at query time (good for facts that change, large/refreshing corpora).
- **Fine-tuning** changes *behavior/skill/format* and can bake in domain knowledge, lower latency/cost (smaller model, no retrieval), and enforce a consistent style or output structure.

Unsloth's docs explicitly bust the "fine-tuning can't add knowledge" myth — it can, and pairs well with RAG.

## When fine-tuning is worth it

- You need a consistent **tone/format/persona** or structured outputs.
- You want a **smaller/cheaper** model to match a bigger one on your task.
- You have **task-specific data** and prompting alone isn't enough.
- You need an on-prem/local model specialized to you.

## Choosing a base model

Pick by size vs your VRAM, license, and task ([what-model-should-i-use]): smaller models (1–8B) fine-tune fast on consumer GPUs; instruct vs base depends on whether you're teaching format (instruct) or raw capability (base). Newer architectures (Llama, Qwen, Gemma, DeepSeek) are well supported ([model-catalog](../summaries/model-catalog-and-notebooks.md)).

## LoRA vs full fine-tuning

Most users do **LoRA/QLoRA** — train small adapter matrices instead of all weights: dramatically less VRAM, fast, and composable. Full fine-tuning is for when you need to change the whole model. Details: [lora-and-hyperparameters](lora-and-hyperparameters.md).


<!-- ===== unsloth/wiki/concepts/inference-and-deployment.md ===== -->

---
title: "Inference & Deployment"
type: concept
tags: [inference, deployment, vllm, lm-studio, serving]
updated: 2026-06-19
confidence: high
sources: [raw/llms_txt_doc-inference-deployment.md, raw/llms_txt_doc-vllm-deployment-inference-guide.md, raw/llms_txt_doc-deploying-models-to-lm-studio.md, raw/llms_txt_doc-how-to-run-local-llms-with-claude-code.md, raw/llms_txt_doc-how-to-run-local-llms-with-openai-codex.md, raw/llms_txt_doc-how-to-run-local-llms-with-docker-step-by-step-guide.md, raw/llms_txt_doc-how-to-run-and-deploy-llms-on-your-ios-or-android-phone.md, raw/llms_txt_doc-how-to-use-mcp-servers-with-local-llms.md, raw/llms_txt_doc-how-to-use-unsloth-as-an-api-endpoint.md]
---

# Inference & Deployment

Unsloth isn't only for training — it runs models for inference too, and exports to the major serving runtimes.

## In-framework inference

Use Unsloth's fast inference (`FastLanguageModel`/`FastModel` with native generation, or 2x-faster inference paths) to test a fine-tune right after training, without exporting — handy for quick eval in the same [notebook](../summaries/model-catalog-and-notebooks.md).

## Serving runtimes

- **vLLM** — the high-throughput production path: export merged 16-bit ([saving-and-exporting](saving-and-exporting.md)) and serve with vLLM. Unsloth documents vLLM deployment and engine arguments; it also integrates vLLM for fast RL rollouts.
- **Ollama** — `ollama run` from the exported GGUF + Modelfile (local/dev).
- **LM Studio** — load the GGUF in LM Studio's local server (OpenAI-compatible) for desktop use.
- **llama.cpp** — GGUF runs anywhere llama.cpp does (CPU/edge).

## Cross-references

Unsloth's docs include guides for running local LLMs with Docker, Claude Code, OpenAI Codex, MCP servers, and on iOS/Android — i.e. consuming your fine-tuned model from many clients (mapped in [docs-catalog](../summaries/docs-catalog.md)). It can also act as an API endpoint directly.

## Choosing

- **Dev/test** → in-framework or Ollama/LM Studio from GGUF.
- **Production throughput** → vLLM with merged 16-bit.
- **Edge/CPU/laptop** → GGUF via llama.cpp/Ollama. Quantization choices: [saving-and-exporting](saving-and-exporting.md).


<!-- ===== unsloth/wiki/concepts/installation.md ===== -->

---
title: "Installation & Requirements"
type: concept
tags: [installation, pip, requirements, gpu, docker]
updated: 2026-06-19
confidence: high
sources: [raw/llms_txt_doc-install-unsloth-via-pip-and-uv.md, raw/llms_txt_doc-unsloth-requirements.md, raw/llms_txt_doc-install-unsloth-via-docker.md, raw/llms_txt_doc-conda-install.md, raw/llms_txt_doc-install-unsloth-on-macos.md, raw/llms_txt_doc-fine-tuning-llms-on-amd-gpus-with-unsloth-guide.md, raw/llms_txt_doc-fine-tuning-llms-on-intel-gpus-with-unsloth.md, raw/llms_txt_doc-how-to-fine-tune-llms-on-windows-with-unsloth-step-by-step-g.md, raw/llms_txt_doc-google-colab.md]
---

# Installation & Requirements

## Requirements

- **GPU** — an NVIDIA GPU is the primary target (CUDA); minimum useful VRAM depends on model size and QLoRA (4-bit) usage — small models fine-tune on ~8GB, larger ones need more. AMD (ROCm) and Intel GPUs are supported via dedicated guides; macOS support exists.
- **Toolchain** — Python, plus build tools (Git, CMake, a C++ compiler) for some dependencies; on Windows these are installed via the setup script (`winget` / Visual Studio Build Tools).

## Installing

- **pip / uv** — the standard path: `pip install unsloth` (uv works too). Match your CUDA/PyTorch; the docs give exact commands and the recommended pinned install.
- **Docker** — an official container avoids dependency wrangling; recommended when the local environment is fussy.
- **Conda**, **macOS**, **AMD**, **Intel**, **Windows**, **Google Colab**, and **VS Code + Colab** each have their own guide ([docs-catalog](../summaries/docs-catalog.md)).
- **Updating** — update to the latest (or pin an old version) per the updating guide; Unsloth ships frequently to support new models.

## Fastest start

The **zero-install path** is a [notebook](../summaries/model-catalog-and-notebooks.md): open an Unsloth Colab/Kaggle notebook, which has everything preinstalled — change the dataset and run. Local install matters when you need your own GPU, private data, or production training. Next: prepare a [dataset](datasets.md) and set [hyperparameters](lora-and-hyperparameters.md).


<!-- ===== unsloth/wiki/concepts/lora-and-hyperparameters.md ===== -->

---
title: "LoRA, QLoRA & Hyperparameters"
type: concept
tags: [lora, qlora, hyperparameters, rank, learning-rate]
updated: 2026-06-19
confidence: high
sources: [raw/llms_txt_doc-lora-fine-tuning-hyperparameters-guide.md, raw/llms_txt_doc-fine-tuning-llms-guide.md]
---

# LoRA, QLoRA & Hyperparameters

## LoRA and QLoRA

- **LoRA** (Low-Rank Adaptation) trains small adapter matrices injected into the model instead of all weights — tiny memory footprint, fast, and the adapter is a few MB you can swap/share.
- **QLoRA** = LoRA on a **4-bit quantized** base model — even less VRAM (the standard way to fine-tune big models on small GPUs), with Unsloth keeping accuracy high.

## Key hyperparameters

- **LoRA rank (`r`)** — adapter capacity; higher = more expressive but more memory/overfit risk. Common 8–64; 16/32 typical.
- **LoRA alpha** — scaling; a common heuristic is `alpha = r` or `2×r`.
- **target_modules** — which projections get adapters (attention q/k/v/o + MLP gate/up/down); targeting all linear layers is the strong default.
- **Learning rate** — e.g. ~2e-4 for LoRA (higher than full FT); too high destabilizes.
- **Epochs** — usually 1–3; more risks overfitting on small data.
- **Batch size & gradient accumulation** — *effective batch size* = `batch_size × grad_accum × #GPUs`; raise grad-accum to simulate a big batch within VRAM limits. It directly affects training stability.
- **Sequence length** — set to your data; Unsloth enables long context efficiently.

## Practical guidance

- Start from an Unsloth [notebook](../summaries/model-catalog-and-notebooks.md)'s defaults; change rank/LR/epochs only as needed.
- Watch eval loss for overfitting; reduce epochs/rank or add data if it diverges from train loss.
- QLoRA first (fits more); move to LoRA/full FT if you have VRAM and need maximum quality.

Adapters can be **hot-swapped** at inference (multiple LoRAs on one base — see [docs-catalog](../summaries/docs-catalog.md)). After training, [save/export](saving-and-exporting.md).


<!-- ===== unsloth/wiki/concepts/reinforcement-learning.md ===== -->

---
title: "Reinforcement Learning (GRPO)"
type: concept
tags: [rl, grpo, gspo, reasoning, reward]
updated: 2026-06-19
confidence: high
sources: [raw/llms_txt_doc-reinforcement-learning-rl-guide.md, raw/llms_txt_doc-rl-reward-hacking.md, raw/llms_txt_doc-advanced-reinforcement-learning-documentation.md]
---

# Reinforcement Learning (GRPO)

Unsloth supports **reinforcement learning** to train **reasoning models** (DeepSeek-R1-style) efficiently — most notably **GRPO** (Group Relative Policy Optimization), plus newer methods (GSPO, FP8 RL).

## What RL adds over SFT

SFT ([lora-and-hyperparameters](lora-and-hyperparameters.md)) imitates examples; **RL optimizes against a reward function**, letting the model discover better reasoning/behaviors than the demonstrations alone — the technique behind reasoning models that "think" before answering.

## GRPO

GRPO samples multiple completions per prompt, scores them with a **reward function**, and pushes the policy toward higher-reward outputs relative to the group — no separate value model needed (cheaper than PPO). Unsloth makes GRPO run with far less VRAM and **much longer context** (its "7x longer context" GRPO), and supports **vision (VLM) RL**.

## Reward functions

You define rewards encoding what "good" means: correctness (e.g. matches a verifier/answer), format adherence, length, etc. Reward design is the crux — see **reward hacking**: models exploit poorly-specified rewards (gaming the metric without the intended behavior), so rewards must be robust and ideally verifiable.

## Practical

- Start from a **GRPO notebook** ([model-catalog](../summaries/model-catalog-and-notebooks.md)) for a model that fits your GPU.
- Combine with QLoRA for VRAM efficiency.
- Newer options: **GSPO**, **FP8 RL**, and long-context GRPO for harder reasoning tasks (mapped in [docs-catalog](../summaries/docs-catalog.md)).
- Watch for reward hacking; use held-out evals, not just training reward.


<!-- ===== unsloth/wiki/concepts/saving-and-exporting.md ===== -->

---
title: "Saving & Exporting Models"
type: concept
tags: [saving, gguf, ollama, export, adapters, merging]
updated: 2026-06-19
confidence: high
sources: [raw/llms_txt_doc-saving-to-gguf.md, raw/llms_txt_doc-saving-models-to-ollama.md, raw/llms_txt_doc-unsloth-dynamic-2-0-ggufs.md]
---

# Saving & Exporting Models

After training, export the model into the format your runtime needs.

## What you can save

- **LoRA adapters** — small (few MB); load on top of the base model later. Best for iteration and swapping ([lora-and-hyperparameters](lora-and-hyperparameters.md)).
- **Merged weights** — merge the adapter into the base to get a standalone model: **16-bit** (full quality) or **4-bit** (smaller). Use merged 16-bit for further serving/quantization.
- **GGUF** — for `llama.cpp`/Ollama/LM Studio CPU+GPU inference.
- **Push to Hugging Face Hub** — upload adapters or merged models.

## GGUF & quantization

`save_pretrained_gguf` / `push_to_hub_gguf` convert to **GGUF** and quantize in one step (q4_k_m, q5_k_m, q8_0, etc.). Unsloth also ships **Dynamic 2.0 GGUFs** — a smarter, per-layer quantization that preserves quality better than naive uniform quantization at the same size (used for its model uploads). Pick the quant by your quality/size/VRAM trade-off.

## Ollama

`save to Ollama` produces a GGUF plus a **Modelfile** so you can `ollama run` your fine-tune immediately — the fastest path to a usable local chatbot from a fresh fine-tune (the classic "fine-tune Llama, run in Ollama" tutorial).

## Choosing a target

- Iterating / multiple variants → **adapters** (hot-swappable).
- Local chat / laptop → **GGUF** (Ollama / [LM Studio](inference-and-deployment.md)).
- Server / throughput → merged 16-bit for **[vLLM](inference-and-deployment.md)**.
- Sharing → push to the Hub.


<!-- ===== unsloth/wiki/concepts/unsloth-studio.md ===== -->

---
title: "Unsloth Studio"
type: concept
tags: [unsloth-studio, web-ui, no-code]
updated: 2026-06-19
confidence: medium
sources: [raw/llms_txt_doc-introducing-unsloth-studio.md, raw/llms_txt_doc-get-started-with-unsloth-studio.md, raw/llms_txt_doc-how-to-run-models-with-unsloth-studio.md]
---

# Unsloth Studio

**Unsloth Studio** is a **web UI** for training and running open models locally — a graphical front-end over the Unsloth framework, lowering the barrier from writing notebook code to clicking through a UI. (Confidence medium — Studio is a newer addition; verify specifics against the live docs.)

## What it offers

- **Run models** — load and chat with open models (Gemma, Qwen, DeepSeek, gpt-oss, etc.) locally through the UI ([how-to-run-models-with-unsloth-studio]).
- **Train / fine-tune** — drive [fine-tuning](lora-and-hyperparameters.md) and [RL](reinforcement-learning.md) without hand-writing the training script.
- **Export** — produce GGUF/other formats from the UI ([saving-and-exporting](saving-and-exporting.md)) for use in [Ollama/LM Studio/vLLM](inference-and-deployment.md).

## When to use Studio vs notebooks/code

- **Studio** — you want a no-/low-code, local GUI to run and fine-tune models.
- **Notebooks / Python** — you want full control, reproducibility, custom datasets/reward functions, or to embed training in a pipeline ([model-catalog-and-notebooks](../summaries/model-catalog-and-notebooks.md)).

Both sit on the same Unsloth engine, so the underlying concepts ([fine-tuning-basics](fine-tuning-basics.md), [datasets](datasets.md), [hyperparameters](lora-and-hyperparameters.md)) carry over. Installation and getting-started for Studio are in [docs-catalog](../summaries/docs-catalog.md).


<!-- ===== unsloth/wiki/concepts/what-is-unsloth.md ===== -->

---
title: "What is Unsloth"
type: concept
tags: [unsloth, fine-tuning, overview, lora]
updated: 2026-06-19
confidence: high
sources: [raw/llms_txt_doc-fine-tuning-for-beginners.md, raw/llms_txt_doc-fine-tuning-llms-guide.md, raw/llms_txt_doc-inference-deployment.md, raw/llms_txt_doc-reinforcement-learning-rl-guide.md, raw/llms_txt_doc-introducing-unsloth-studio.md, raw/llms_txt_doc-unsloth-model-catalog.md, raw/llms_txt_doc-export-models-with-unsloth-studio.md]
---

# What is Unsloth

**Unsloth** is an open-source framework for **fast, memory-efficient fine-tuning and running of LLMs**. It makes training accessible on a single consumer GPU (and free Colab/Kaggle), with large speedups and lower VRAM versus standard Hugging Face training — achieved via custom Triton kernels and an optimized backprop path, with **no loss of accuracy**.

## What it does

- **Fine-tuning** — LoRA / QLoRA adapters on top of base models, and full fine-tuning ([lora-and-hyperparameters](lora-and-hyperparameters.md)).
- **Reinforcement learning** — GRPO and related RL to train reasoning models ([reinforcement-learning](reinforcement-learning.md)).
- **Running & exporting** — run models for inference and export to GGUF/Ollama/vLLM/LM Studio ([saving-and-exporting](saving-and-exporting.md), [inference-and-deployment](inference-and-deployment.md)).
- **Broad model support** — Llama, Mistral, Gemma, Qwen, DeepSeek, gpt-oss, Phi, and more ([model-catalog](../summaries/model-catalog-and-notebooks.md)).

## Why people use it

- **2x+ faster training, far less VRAM** — fine-tune models that otherwise wouldn't fit.
- **Beginner-friendly** — ready-to-run [notebooks](../summaries/model-catalog-and-notebooks.md) for Colab/Kaggle; you change the dataset and run.
- **Drop-in** — `FastLanguageModel`/`FastModel` wrap Hugging Face + TRL, so existing training code mostly works.

## How a project flows

Decide if fine-tuning fits ([fine-tuning-basics](fine-tuning-basics.md)) → install ([installation](installation.md)) → prepare a [dataset](datasets.md) → set [LoRA hyperparameters](lora-and-hyperparameters.md) → train (SFT or [RL](reinforcement-learning.md)) → [save/export](saving-and-exporting.md) → [run/deploy](inference-and-deployment.md). A newer **[Unsloth Studio](unsloth-studio.md)** web UI wraps this. Full map: [docs-catalog](../summaries/docs-catalog.md).


<!-- ===== unsloth/wiki/entities/unsloth-library.md ===== -->

---
title: "The unsloth Library (FastLanguageModel)"
type: entity
tags: [library, api, fastlanguagemodel, trl]
updated: 2026-06-19
confidence: high
sources: [raw/llms_txt_doc-fine-tuning-llms-guide.md, raw/llms_txt_doc-lora-fine-tuning-hyperparameters-guide.md]
---

# The unsloth Library (FastLanguageModel)

The `unsloth` Python package is the core API — a drop-in acceleration layer over Hugging Face Transformers + TRL.

## The typical training script

1. **Load** — `FastLanguageModel.from_pretrained(model_name, max_seq_length, load_in_4bit=...)` (or `FastModel` for vision/multimodal). This returns the model + tokenizer with Unsloth's optimizations applied.
2. **Add LoRA** — `FastLanguageModel.get_peft_model(model, r=..., lora_alpha=..., target_modules=...)` to attach [LoRA adapters](../concepts/lora-and-hyperparameters.md).
3. **Format data** — apply the [chat template](../concepts/datasets.md) to your [dataset](../concepts/datasets.md).
4. **Train** — hand the model to TRL's `SFTTrainer` (or `GRPOTrainer` for [RL](../concepts/reinforcement-learning.md)) with `TrainingArguments`; Unsloth patches make it 2x+ faster and lower-VRAM.
5. **Save/export** — `save_pretrained` (adapters), `save_pretrained_merged`, `save_pretrained_gguf`, or push to the Hub ([saving-and-exporting](../concepts/saving-and-exporting.md)).
6. **Infer** — `FastLanguageModel.for_inference(model)` for fast generation ([inference-and-deployment](../concepts/inference-and-deployment.md)).

## Why drop-in

Because it wraps standard Transformers/TRL objects, existing training code mostly works — you swap the model-loading lines and keep your `SFTTrainer`/dataset code. That compatibility (plus the kernels) is Unsloth's core value. Idiomatic usage lives in the [notebooks](../summaries/model-catalog-and-notebooks.md); exact current signatures are in the source docs ([docs-catalog](../summaries/docs-catalog.md)).


<!-- ===== unsloth/wiki/log.md ===== -->

---
title: "Activity Log"
type: log
---

# Activity Log

Append-only record of all wiki changes.

## Format

Each entry follows this format:
```
### YYYY-MM-DD HH:MM — [Action Type]
- **Source/Trigger**: what initiated the action
- **Pages created**: list of new pages
- **Pages updated**: list of updated pages
- **Notes**: any contradictions flagged, decisions made
```

---

### 2026-04-08 00:00 — Setup

- **Source/Trigger**: Repository initialized
- **Pages created**: index.md, log.md, dashboard.md, analytics.md, flashcards.md
- **Pages updated**: none
- **Notes**: Empty knowledge base ready for first source ingestion

### 2026-06-19 00:00 — Initial curation (medium rung)

- **Source/Trigger**: 131 docs.unsloth.ai pages (llms.txt)
- **Pages created**: 9 concepts, 1 entity (unsloth-library), 2 summaries (model-catalog-and-notebooks, docs-catalog), 1 synthesis (end-to-end-fine-tune)
- **Pages updated**: index.md
- **Notes**: Unsloth = fast/memory-efficient LLM fine-tuning (LoRA/QLoRA, GRPO RL, GGUF/Ollama/vLLM export). Many per-model run/fine-tune guides catalogued not ingested. unsloth-studio confidence:medium (newer feature). Category: inference.


<!-- ===== unsloth/wiki/summaries/docs-catalog.md ===== -->

---
title: "Docs Catalog"
type: summary
tags: [catalog, map, reference]
updated: 2026-06-19
confidence: high
sources: [raw/llms_txt-llms-txt-index.md, raw/llms_txt_doc-fine-tuning-llms-guide.md]
---

# Docs Catalog

Map of the Unsloth docs (docs.unsloth.ai llms.txt; ~130 pages mirrored in `raw/` as `llms_txt_doc-<slug>.md`).

| Area | Raw slugs (selection) | Wiki coverage |
|---|---|---|
| Get started | fine-tuning-for-beginners, faq-is-fine-tuning-right-for-me, what-model-should-i-use-for-fine-tuning, unsloth-requirements, unsloth-notebooks, unsloth-model-catalog | [what-is-unsloth](../concepts/what-is-unsloth.md), [fine-tuning-basics](../concepts/fine-tuning-basics.md) |
| Install | install-unsloth-via-pip-and-uv, install-unsloth-on-macos, install-unsloth-via-docker, windows/amd/intel/conda/google-colab/vs-code | [installation](../concepts/installation.md) |
| Fine-tuning | fine-tuning-llms-guide, datasets-guide, lora-fine-tuning-hyperparameters-guide, lora-hot-swapping-guide | [datasets](../concepts/datasets.md), [lora-and-hyperparameters](../concepts/lora-and-hyperparameters.md) |
| RL | reinforcement-learning-rl-guide, grpo-long-context, vision-rl, gspo, fp8-rl, rl-reward-hacking, advanced-reinforcement-learning | [reinforcement-learning](../concepts/reinforcement-learning.md) |
| Saving/export | saving-to-gguf, saving-models-to-ollama, unsloth-dynamic-2-0-ggufs | [saving-and-exporting](../concepts/saving-and-exporting.md) |
| Inference/deploy | inference-deployment, vllm-deployment-inference-guide, vllm-engine-arguments, deploying-models-to-lm-studio, run-with-docker/claude-code/codex/mcp, ios-android | [inference-and-deployment](../concepts/inference-and-deployment.md) |
| Studio | introducing-unsloth-studio, get-started-with-unsloth-studio, unsloth-studio-installation, how-to-run/export-with-unsloth-studio | [unsloth-studio](../concepts/unsloth-studio.md) |
| Per-model guides | `<model>-how-to-run-locally`, `<model>-fine-tune` (Llama, Qwen, Gemma, DeepSeek, GLM, Kimi, gpt-oss, Granite, Devstral, …) | [model-catalog-and-notebooks](model-catalog-and-notebooks.md) |
| Troubleshooting | troubleshooting-inference | [what-is-unsloth](../concepts/what-is-unsloth.md) |

## Notes

- **Per-model guides not ingested individually** — model-specific VRAM/quant/template details change per release; find the exact `*-how-to-run`/`*-fine-tune` page in `raw/`.
- The docs are a GitBook with an **`ask` query** capability (HTTP GET with `?ask=`) for live Q&A.
- Unsloth ships very frequently to support new models — verify version/model specifics against docs.unsloth.ai.


<!-- ===== unsloth/wiki/summaries/model-catalog-and-notebooks.md ===== -->

---
title: "Model Catalog & Notebooks"
type: summary
tags: [models, notebooks, catalog, colab]
updated: 2026-06-19
confidence: high
sources: [raw/llms_txt_doc-unsloth-model-catalog.md, raw/llms_txt_doc-unsloth-notebooks.md, raw/llms_txt_doc-reinforcement-learning-rl-guide.md, raw/llms_txt_doc-unsloth-dynamic-2-0-ggufs.md]
---

# Model Catalog & Notebooks

Unsloth provides ready-to-run **notebooks** (the recommended starting point) and supports a large, frequently-updated **model catalog**. This page maps the space; per-model run/fine-tune guides live in `raw/` as `llms_txt_doc-<model>-how-to-run...`.

## Notebooks

Open a Colab/Kaggle notebook with everything preinstalled, change the dataset, and run ([installation](../concepts/installation.md)). Categories:

- **SFT fine-tuning** — per model family (Llama, Qwen, Gemma, Mistral, Phi, …).
- **GRPO / RL** — reasoning training (Qwen, gpt-oss GSPO, Llama, Phi-4, DeepSeek-R1-distill, vision GRPO) ([reinforcement-learning](../concepts/reinforcement-learning.md)).
- **Vision / multimodal**, **continued pretraining**, **embedding model** fine-tuning, **TTS**, and more.

## Model families covered (run + fine-tune guides)

Llama, **Qwen / Qwen-Image**, **Gemma 3 / 3n / 4 (incl. QAT)**, **DeepSeek (R1, V3, OCR)**, **GLM (4.x–5.x)**, **Kimi K2**, **gpt-oss**, **IBM Granite 4**, **Devstral**, **Grok-2**, Phi, Mistral, and others. Each has a "how to run locally" and often a "fine-tune" guide.

## Dynamic GGUF uploads

Unsloth publishes its own quantized models as **Dynamic 2.0 GGUFs** (better quality per byte — [saving-and-exporting](../concepts/saving-and-exporting.md)), including hard-to-quantize large models (e.g. dynamic 1.58-bit DeepSeek-R1).

## Using this catalog

To fine-tune or run a specific model, find its `*-how-to-run` / `*-fine-tune` page in `raw/` for model-specific VRAM, quant, and template notes — those specifics change per model and aren't reproduced here. General workflow is in the concept pages ([what-is-unsloth](../concepts/what-is-unsloth.md)).


<!-- ===== unsloth/wiki/syntheses/end-to-end-fine-tune.md ===== -->

---
title: "End-to-End: Fine-tune to Deployment"
type: synthesis
tags: [workflow, end-to-end, synthesis]
updated: 2026-06-19
confidence: high
sources: [raw/llms_txt_doc-fine-tuning-llms-guide.md, raw/llms_txt_doc-datasets-guide.md, raw/llms_txt_doc-lora-fine-tuning-hyperparameters-guide.md, raw/llms_txt_doc-saving-to-gguf.md]
---

# End-to-End: Fine-tune to Deployment

The complete Unsloth workflow, tying the concept pages together.

## 1. Decide & pick a model

Confirm fine-tuning fits (vs RAG/prompting — [fine-tuning-basics](../concepts/fine-tuning-basics.md)). Choose a base model sized to your GPU and task.

## 2. Set up

[Install](../concepts/installation.md) locally, or just open the matching [notebook](../summaries/model-catalog-and-notebooks.md) (zero setup).

## 3. Prepare data

Build a clean [dataset](../concepts/datasets.md) in chat/instruction format; apply the model's **chat template**; hold out an eval slice. Quality beats quantity.

## 4. Configure

Load with `FastLanguageModel` ([unsloth-library](../entities/unsloth-library.md)), add **QLoRA** (4-bit) adapters, set [hyperparameters](../concepts/lora-and-hyperparameters.md): rank/alpha, target_modules, LR ~2e-4, 1–3 epochs, effective batch size via grad-accum.

## 5. Train

Run `SFTTrainer` for supervised fine-tuning, or [GRPO](../concepts/reinforcement-learning.md) for reasoning/RL (with a robust reward function — watch for reward hacking). Monitor train vs eval loss for overfitting.

## 6. Evaluate

Test the fine-tune with in-framework [inference](../concepts/inference-and-deployment.md) on held-out prompts before exporting.

## 7. Save & export

[Save](../concepts/saving-and-exporting.md) adapters (iterate), or merge to 16-bit/4-bit, or export **GGUF** (Dynamic 2.0 quant) / Ollama Modelfile / push to the Hub.

## 8. Deploy

[Run/serve](../concepts/inference-and-deployment.md): Ollama or LM Studio (local), **vLLM** (production throughput), llama.cpp (edge/CPU). Or use the [Studio](../concepts/unsloth-studio.md) UI for the whole loop.

## Common pitfalls

Wrong/missing [chat template](../concepts/datasets.md) (garbled outputs), too many epochs (overfit), LR too high (instability/loss spikes), OOM (drop to QLoRA, lower batch/seq-len, raise grad-accum), and RL [reward hacking](../concepts/reinforcement-learning.md). Model-specific gotchas: the per-model guides ([model-catalog-and-notebooks](../summaries/model-catalog-and-notebooks.md)).