# Unsloth — full corpus # LLM Wiki An open-source template for building LLM-powered knowledge bases, following [Andrej Karpathy's "LLM Wiki" pattern](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f). You provide raw sources. The LLM reads them, writes structured wiki pages, cross-links everything, and maintains it over time. You never edit the wiki directly — you curate sources and ask questions. ## How It Works The system has three layers: ``` raw/ Sources you collect (articles, transcripts, notes, PDFs) wiki/ LLM-written & maintained pages (summaries, concepts, entities, syntheses) CLAUDE.md Schema that tells the LLM how to structure everything ``` Three operations drive the workflow: | Operation | Trigger | What happens | |-----------|---------|--------------| | **Ingest** | "ingest raw/my-source.txt" | LLM reads the source, creates a summary page, creates/updates concept and entity pages, adds cross-links, updates the index and log | | **Query** | Ask any question | LLM searches the wiki, synthesizes an answer with citations, optionally creates a synthesis page for novel insights | | **Lint** | "lint" or "health check" | LLM audits all pages for orphans, contradictions, missing links, incomplete sections, and low-confidence claims — fixes what it can, reports the rest | ## Quick Start 1. **Clone this repo** ```bash git clone https://github.com/YOUR_USERNAME/llm-wiki.git my-knowledge-base cd my-knowledge-base ``` 2. **Customize CLAUDE.md** for your domain - Update the Purpose section with your topic - Replace the placeholder tagging taxonomy with your own categories - Adjust confidence level descriptions if needed - Everything else (workflows, page formats, linking rules) works as-is 3. **Drop sources into `raw/`** - Text files, transcripts, articles, notes — any plain text - These are immutable once added; the LLM never modifies them 4. **Tell the LLM to ingest** ``` ingest raw/my-first-source.txt ``` The LLM will create summary pages, concept pages, entity pages, cross-links, and update the index. 5. **Ask questions** ``` What are the key differences between X and Y? ``` The LLM answers from the wiki, citing specific pages. 6. **Run health checks** ``` lint ``` The LLM audits the wiki and fixes issues. ## Directory Structure ``` . ├── CLAUDE.md # Schema — the LLM's instructions ├── raw/ # Your source documents (immutable) └── wiki/ ├── index.md # Master catalog of all pages ├── log.md # Append-only activity log ├── dashboard.md # Dataview dashboard (Obsidian) ├── analytics.md # Charts View analytics (Obsidian) ├── flashcards.md # Spaced repetition cards ├── summaries/ # One page per source document ├── concepts/ # Concept and framework pages ├── entities/ # People, tools, organizations, etc. ├── syntheses/ # Cross-cutting analyses and comparisons ├── journal/ # Research/session journal entries │ └── template.md # Journal entry template └── presentations/ # Marp slide decks ``` ## Enhancements This template includes several extras beyond the core wiki pattern: ### Dataview Dashboard (`wiki/dashboard.md`) Live queries that surface low-confidence pages, recent updates, concepts by tag, and pages with the most sources. Requires the [Dataview](https://github.com/blacksmithgu/obsidian-dataview) Obsidian plugin. ### Charts View Analytics (`wiki/analytics.md`) Visual analytics with pie charts, bar charts, and word clouds. Requires the [Charts View](https://github.com/caronchen/obsidian-chartsview-plugin) Obsidian plugin. ### Mermaid Diagrams Use Mermaid code blocks in any wiki page to create flowcharts, sequence diagrams, or concept maps. Native support in Obsidian and GitHub. ### Marp Slides (`wiki/presentations/`) Create slide decks from markdown using [Marp](https://marp.app/). Drop presentation files in this directory. ### Research Journal (`wiki/journal/`) Track your research sessions, experiments, or applied work with the included template. The LLM can reference journal entries when answering queries. ### Spaced Repetition (`wiki/flashcards.md`) Flashcards in the format used by the [Spaced Repetition](https://github.com/st3v3nmw/obsidian-spaced-repetition) Obsidian plugin. Ask the LLM to generate flashcards from any wiki page. ### MCP Server This repo works with Claude Code's MCP server capabilities. Point an MCP-compatible client at this repo and the LLM can read/write the wiki programmatically. ## Customizing for Your Domain The schema in `CLAUDE.md` is domain-agnostic. To adapt it: 1. **Purpose** — Describe your knowledge domain in one paragraph 2. **Tagging taxonomy** — Replace placeholder categories with your own (e.g., for a cooking KB: `cuisine`, `technique`, `ingredient`, `equipment`) 3. **Confidence levels** — Adjust the descriptions to match your domain's evidence standards 4. **Entity types** — Update the entity page description to match what entities mean in your domain (people, tools, companies, etc.) 5. **Journal template** — Customize `wiki/journal/template.md` for your workflow Everything else — page format, linking conventions, workflows, rules — is universal and works across domains. ## Example Domains This template works for any knowledge-intensive topic: - **Research notes** — papers, experiments, methodologies - **Book analysis** — themes, characters, author techniques - **Competitive analysis** — companies, products, market trends - **Course notes** — lectures, readings, key concepts - **Personal development** — frameworks, habits, book summaries - **Technical documentation** — APIs, architectures, design patterns - **Hobby deep-dives** — any subject you want to master ## License MIT --- title: "Knowledge Base Index" type: index updated: 2026-06-19 --- # Knowledge Base Index Master catalog of all wiki pages. Every page in the wiki must have an entry here. ## Concepts | Page | Tags | Confidence | Updated | |------|------|------------|---------| | [what-is-unsloth](concepts/what-is-unsloth.md) | unsloth, fine-tuning, overview | high | 2026-06-19 | | [fine-tuning-basics](concepts/fine-tuning-basics.md) | fine-tuning, rag, when-to-use | high | 2026-06-19 | | [installation](concepts/installation.md) | installation, pip, requirements | high | 2026-06-19 | | [datasets](concepts/datasets.md) | datasets, chat-template, data-prep | high | 2026-06-19 | | [lora-and-hyperparameters](concepts/lora-and-hyperparameters.md) | lora, qlora, hyperparameters | high | 2026-06-19 | | [reinforcement-learning](concepts/reinforcement-learning.md) | rl, grpo, reasoning | high | 2026-06-19 | | [saving-and-exporting](concepts/saving-and-exporting.md) | saving, gguf, ollama, export | high | 2026-06-19 | | [inference-and-deployment](concepts/inference-and-deployment.md) | inference, vllm, lm-studio | high | 2026-06-19 | | [unsloth-studio](concepts/unsloth-studio.md) | unsloth-studio, web-ui | medium | 2026-06-19 | ## Entities | Page | Tags | Updated | |------|------|---------| | [unsloth-library](entities/unsloth-library.md) | library, fastlanguagemodel, trl | 2026-06-19 | ## Summaries | Page | Source | Key Topics | Created | |------|--------|------------|---------| | [model-catalog-and-notebooks](summaries/model-catalog-and-notebooks.md) | model/notebook index | supported models, notebooks | 2026-06-19 | | [docs-catalog](summaries/docs-catalog.md) | docs.unsloth.ai llms.txt | area map | 2026-06-19 | ## Syntheses | Page | Pages Compared | Created | |------|----------------|---------| | [end-to-end-fine-tune](syntheses/end-to-end-fine-tune.md) | full workflow + pitfalls | 2026-06-19 | ## Statistics - **Total pages**: 13 - **Concepts**: 9 - **Entities**: 1 - **Summaries**: 2 - **Syntheses**: 1 - **Sources ingested**: 131 (docs.unsloth.ai llms.txt: 1 index + 130 pages; many per-model guides catalogued) - **High confidence**: 12 - **Medium confidence**: 1 --- title: "Datasets & Chat Templates" type: concept tags: [datasets, chat-template, formatting, data-prep] updated: 2026-06-19 confidence: high sources: [raw/llms_txt_doc-datasets-guide.md, raw/llms_txt_doc-fine-tuning-llms-guide.md] --- # Datasets & Chat Templates Data quality is the biggest lever in fine-tuning — Unsloth's datasets guide covers preparing data for SFT and [RL](reinforcement-learning.md). ## Dataset formats - **Instruction / chat format** — the common case: examples as messages (`system`/`user`/`assistant`) or instruction–input–output triples, rendered through the model's **chat template** so training matches how the model is prompted at inference. - **Completion / raw text** — continued-pretraining-style raw text. - **Preference data** — for RL/DPO-style methods (prompt + chosen/rejected). ## Chat templates Each model family has a specific **chat template** (special tokens, role markers). Unsloth provides helpers to apply the correct template so your formatted data exactly matches the model's expected structure — a frequent source of bad results when mismatched. Train on the **assistant turns** (mask the prompt) so the model learns to *produce* responses, not echo prompts. ## Handling missing/empty fields The guide shows a neat technique: wrap optional columns in `[[ ]]` so rows with **empty values** drop that text entirely rather than emitting "EMPTY" — keeping prompts clean across heterogeneous rows. ## Practical tips - **Quality > quantity** — a few hundred to a few thousand clean, on-task examples often beat huge noisy sets. - **Match inference format** — format training data exactly as you'll prompt the model. - **Hold out** a small eval set to check for overfitting. - Standard sources: Hugging Face datasets, your own JSON/CSV. Then set [hyperparameters](lora-and-hyperparameters.md) and train. --- title: "Fine-tuning Basics: Is It Right for You?" type: concept tags: [fine-tuning, basics, rag, lora, when-to-use] updated: 2026-06-19 confidence: high sources: [raw/llms_txt_doc-fine-tuning-for-beginners.md, raw/llms_txt_doc-faq-is-fine-tuning-right-for-me.md, raw/llms_txt_doc-what-model-should-i-use-for-fine-tuning.md] --- # Fine-tuning Basics: Is It Right for You? **Fine-tuning** adapts a pretrained model to your data/task — teaching it a style, domain, format, or behavior it doesn't have out of the box. ## Fine-tuning vs RAG (a common misconception) They solve different problems and often combine: - **RAG** injects *knowledge* at query time (good for facts that change, large/refreshing corpora). - **Fine-tuning** changes *behavior/skill/format* and can bake in domain knowledge, lower latency/cost (smaller model, no retrieval), and enforce a consistent style or output structure. Unsloth's docs explicitly bust the "fine-tuning can't add knowledge" myth — it can, and pairs well with RAG. ## When fine-tuning is worth it - You need a consistent **tone/format/persona** or structured outputs. - You want a **smaller/cheaper** model to match a bigger one on your task. - You have **task-specific data** and prompting alone isn't enough. - You need an on-prem/local model specialized to you. ## Choosing a base model Pick by size vs your VRAM, license, and task ([what-model-should-i-use]): smaller models (1–8B) fine-tune fast on consumer GPUs; instruct vs base depends on whether you're teaching format (instruct) or raw capability (base). Newer architectures (Llama, Qwen, Gemma, DeepSeek) are well supported ([model-catalog](../summaries/model-catalog-and-notebooks.md)). ## LoRA vs full fine-tuning Most users do **LoRA/QLoRA** — train small adapter matrices instead of all weights: dramatically less VRAM, fast, and composable. Full fine-tuning is for when you need to change the whole model. Details: [lora-and-hyperparameters](lora-and-hyperparameters.md). --- title: "Inference & Deployment" type: concept tags: [inference, deployment, vllm, lm-studio, serving] updated: 2026-06-19 confidence: high sources: [raw/llms_txt_doc-inference-deployment.md, raw/llms_txt_doc-vllm-deployment-inference-guide.md, raw/llms_txt_doc-deploying-models-to-lm-studio.md, raw/llms_txt_doc-how-to-run-local-llms-with-claude-code.md, raw/llms_txt_doc-how-to-run-local-llms-with-openai-codex.md, raw/llms_txt_doc-how-to-run-local-llms-with-docker-step-by-step-guide.md, raw/llms_txt_doc-how-to-run-and-deploy-llms-on-your-ios-or-android-phone.md, raw/llms_txt_doc-how-to-use-mcp-servers-with-local-llms.md, raw/llms_txt_doc-how-to-use-unsloth-as-an-api-endpoint.md] --- # Inference & Deployment Unsloth isn't only for training — it runs models for inference too, and exports to the major serving runtimes. ## In-framework inference Use Unsloth's fast inference (`FastLanguageModel`/`FastModel` with native generation, or 2x-faster inference paths) to test a fine-tune right after training, without exporting — handy for quick eval in the same [notebook](../summaries/model-catalog-and-notebooks.md). ## Serving runtimes - **vLLM** — the high-throughput production path: export merged 16-bit ([saving-and-exporting](saving-and-exporting.md)) and serve with vLLM. Unsloth documents vLLM deployment and engine arguments; it also integrates vLLM for fast RL rollouts. - **Ollama** — `ollama run` from the exported GGUF + Modelfile (local/dev). - **LM Studio** — load the GGUF in LM Studio's local server (OpenAI-compatible) for desktop use. - **llama.cpp** — GGUF runs anywhere llama.cpp does (CPU/edge). ## Cross-references Unsloth's docs include guides for running local LLMs with Docker, Claude Code, OpenAI Codex, MCP servers, and on iOS/Android — i.e. consuming your fine-tuned model from many clients (mapped in [docs-catalog](../summaries/docs-catalog.md)). It can also act as an API endpoint directly. ## Choosing - **Dev/test** → in-framework or Ollama/LM Studio from GGUF. - **Production throughput** → vLLM with merged 16-bit. - **Edge/CPU/laptop** → GGUF via llama.cpp/Ollama. Quantization choices: [saving-and-exporting](saving-and-exporting.md). --- title: "Installation & Requirements" type: concept tags: [installation, pip, requirements, gpu, docker] updated: 2026-06-19 confidence: high sources: [raw/llms_txt_doc-install-unsloth-via-pip-and-uv.md, raw/llms_txt_doc-unsloth-requirements.md, raw/llms_txt_doc-install-unsloth-via-docker.md, raw/llms_txt_doc-conda-install.md, raw/llms_txt_doc-install-unsloth-on-macos.md, raw/llms_txt_doc-fine-tuning-llms-on-amd-gpus-with-unsloth-guide.md, raw/llms_txt_doc-fine-tuning-llms-on-intel-gpus-with-unsloth.md, raw/llms_txt_doc-how-to-fine-tune-llms-on-windows-with-unsloth-step-by-step-g.md, raw/llms_txt_doc-google-colab.md] --- # Installation & Requirements ## Requirements - **GPU** — an NVIDIA GPU is the primary target (CUDA); minimum useful VRAM depends on model size and QLoRA (4-bit) usage — small models fine-tune on ~8GB, larger ones need more. AMD (ROCm) and Intel GPUs are supported via dedicated guides; macOS support exists. - **Toolchain** — Python, plus build tools (Git, CMake, a C++ compiler) for some dependencies; on Windows these are installed via the setup script (`winget` / Visual Studio Build Tools). ## Installing - **pip / uv** — the standard path: `pip install unsloth` (uv works too). Match your CUDA/PyTorch; the docs give exact commands and the recommended pinned install. - **Docker** — an official container avoids dependency wrangling; recommended when the local environment is fussy. - **Conda**, **macOS**, **AMD**, **Intel**, **Windows**, **Google Colab**, and **VS Code + Colab** each have their own guide ([docs-catalog](../summaries/docs-catalog.md)). - **Updating** — update to the latest (or pin an old version) per the updating guide; Unsloth ships frequently to support new models. ## Fastest start The **zero-install path** is a [notebook](../summaries/model-catalog-and-notebooks.md): open an Unsloth Colab/Kaggle notebook, which has everything preinstalled — change the dataset and run. Local install matters when you need your own GPU, private data, or production training. Next: prepare a [dataset](datasets.md) and set [hyperparameters](lora-and-hyperparameters.md). --- title: "LoRA, QLoRA & Hyperparameters" type: concept tags: [lora, qlora, hyperparameters, rank, learning-rate] updated: 2026-06-19 confidence: high sources: [raw/llms_txt_doc-lora-fine-tuning-hyperparameters-guide.md, raw/llms_txt_doc-fine-tuning-llms-guide.md] --- # LoRA, QLoRA & Hyperparameters ## LoRA and QLoRA - **LoRA** (Low-Rank Adaptation) trains small adapter matrices injected into the model instead of all weights — tiny memory footprint, fast, and the adapter is a few MB you can swap/share. - **QLoRA** = LoRA on a **4-bit quantized** base model — even less VRAM (the standard way to fine-tune big models on small GPUs), with Unsloth keeping accuracy high. ## Key hyperparameters - **LoRA rank (`r`)** — adapter capacity; higher = more expressive but more memory/overfit risk. Common 8–64; 16/32 typical. - **LoRA alpha** — scaling; a common heuristic is `alpha = r` or `2×r`. - **target_modules** — which projections get adapters (attention q/k/v/o + MLP gate/up/down); targeting all linear layers is the strong default. - **Learning rate** — e.g. ~2e-4 for LoRA (higher than full FT); too high destabilizes. - **Epochs** — usually 1–3; more risks overfitting on small data. - **Batch size & gradient accumulation** — *effective batch size* = `batch_size × grad_accum × #GPUs`; raise grad-accum to simulate a big batch within VRAM limits. It directly affects training stability. - **Sequence length** — set to your data; Unsloth enables long context efficiently. ## Practical guidance - Start from an Unsloth [notebook](../summaries/model-catalog-and-notebooks.md)'s defaults; change rank/LR/epochs only as needed. - Watch eval loss for overfitting; reduce epochs/rank or add data if it diverges from train loss. - QLoRA first (fits more); move to LoRA/full FT if you have VRAM and need maximum quality. Adapters can be **hot-swapped** at inference (multiple LoRAs on one base — see [docs-catalog](../summaries/docs-catalog.md)). After training, [save/export](saving-and-exporting.md). --- title: "Reinforcement Learning (GRPO)" type: concept tags: [rl, grpo, gspo, reasoning, reward] updated: 2026-06-19 confidence: high sources: [raw/llms_txt_doc-reinforcement-learning-rl-guide.md, raw/llms_txt_doc-rl-reward-hacking.md, raw/llms_txt_doc-advanced-reinforcement-learning-documentation.md] --- # Reinforcement Learning (GRPO) Unsloth supports **reinforcement learning** to train **reasoning models** (DeepSeek-R1-style) efficiently — most notably **GRPO** (Group Relative Policy Optimization), plus newer methods (GSPO, FP8 RL). ## What RL adds over SFT SFT ([lora-and-hyperparameters](lora-and-hyperparameters.md)) imitates examples; **RL optimizes against a reward function**, letting the model discover better reasoning/behaviors than the demonstrations alone — the technique behind reasoning models that "think" before answering. ## GRPO GRPO samples multiple completions per prompt, scores them with a **reward function**, and pushes the policy toward higher-reward outputs relative to the group — no separate value model needed (cheaper than PPO). Unsloth makes GRPO run with far less VRAM and **much longer context** (its "7x longer context" GRPO), and supports **vision (VLM) RL**. ## Reward functions You define rewards encoding what "good" means: correctness (e.g. matches a verifier/answer), format adherence, length, etc. Reward design is the crux — see **reward hacking**: models exploit poorly-specified rewards (gaming the metric without the intended behavior), so rewards must be robust and ideally verifiable. ## Practical - Start from a **GRPO notebook** ([model-catalog](../summaries/model-catalog-and-notebooks.md)) for a model that fits your GPU. - Combine with QLoRA for VRAM efficiency. - Newer options: **GSPO**, **FP8 RL**, and long-context GRPO for harder reasoning tasks (mapped in [docs-catalog](../summaries/docs-catalog.md)). - Watch for reward hacking; use held-out evals, not just training reward. --- title: "Saving & Exporting Models" type: concept tags: [saving, gguf, ollama, export, adapters, merging] updated: 2026-06-19 confidence: high sources: [raw/llms_txt_doc-saving-to-gguf.md, raw/llms_txt_doc-saving-models-to-ollama.md, raw/llms_txt_doc-unsloth-dynamic-2-0-ggufs.md] --- # Saving & Exporting Models After training, export the model into the format your runtime needs. ## What you can save - **LoRA adapters** — small (few MB); load on top of the base model later. Best for iteration and swapping ([lora-and-hyperparameters](lora-and-hyperparameters.md)). - **Merged weights** — merge the adapter into the base to get a standalone model: **16-bit** (full quality) or **4-bit** (smaller). Use merged 16-bit for further serving/quantization. - **GGUF** — for `llama.cpp`/Ollama/LM Studio CPU+GPU inference. - **Push to Hugging Face Hub** — upload adapters or merged models. ## GGUF & quantization `save_pretrained_gguf` / `push_to_hub_gguf` convert to **GGUF** and quantize in one step (q4_k_m, q5_k_m, q8_0, etc.). Unsloth also ships **Dynamic 2.0 GGUFs** — a smarter, per-layer quantization that preserves quality better than naive uniform quantization at the same size (used for its model uploads). Pick the quant by your quality/size/VRAM trade-off. ## Ollama `save to Ollama` produces a GGUF plus a **Modelfile** so you can `ollama run` your fine-tune immediately — the fastest path to a usable local chatbot from a fresh fine-tune (the classic "fine-tune Llama, run in Ollama" tutorial). ## Choosing a target - Iterating / multiple variants → **adapters** (hot-swappable). - Local chat / laptop → **GGUF** (Ollama / [LM Studio](inference-and-deployment.md)). - Server / throughput → merged 16-bit for **[vLLM](inference-and-deployment.md)**. - Sharing → push to the Hub. --- title: "Unsloth Studio" type: concept tags: [unsloth-studio, web-ui, no-code] updated: 2026-06-19 confidence: medium sources: [raw/llms_txt_doc-introducing-unsloth-studio.md, raw/llms_txt_doc-get-started-with-unsloth-studio.md, raw/llms_txt_doc-how-to-run-models-with-unsloth-studio.md] --- # Unsloth Studio **Unsloth Studio** is a **web UI** for training and running open models locally — a graphical front-end over the Unsloth framework, lowering the barrier from writing notebook code to clicking through a UI. (Confidence medium — Studio is a newer addition; verify specifics against the live docs.) ## What it offers - **Run models** — load and chat with open models (Gemma, Qwen, DeepSeek, gpt-oss, etc.) locally through the UI ([how-to-run-models-with-unsloth-studio]). - **Train / fine-tune** — drive [fine-tuning](lora-and-hyperparameters.md) and [RL](reinforcement-learning.md) without hand-writing the training script. - **Export** — produce GGUF/other formats from the UI ([saving-and-exporting](saving-and-exporting.md)) for use in [Ollama/LM Studio/vLLM](inference-and-deployment.md). ## When to use Studio vs notebooks/code - **Studio** — you want a no-/low-code, local GUI to run and fine-tune models. - **Notebooks / Python** — you want full control, reproducibility, custom datasets/reward functions, or to embed training in a pipeline ([model-catalog-and-notebooks](../summaries/model-catalog-and-notebooks.md)). Both sit on the same Unsloth engine, so the underlying concepts ([fine-tuning-basics](fine-tuning-basics.md), [datasets](datasets.md), [hyperparameters](lora-and-hyperparameters.md)) carry over. Installation and getting-started for Studio are in [docs-catalog](../summaries/docs-catalog.md). --- title: "What is Unsloth" type: concept tags: [unsloth, fine-tuning, overview, lora] updated: 2026-06-19 confidence: high sources: [raw/llms_txt_doc-fine-tuning-for-beginners.md, raw/llms_txt_doc-fine-tuning-llms-guide.md, raw/llms_txt_doc-inference-deployment.md, raw/llms_txt_doc-reinforcement-learning-rl-guide.md, raw/llms_txt_doc-introducing-unsloth-studio.md, raw/llms_txt_doc-unsloth-model-catalog.md, raw/llms_txt_doc-export-models-with-unsloth-studio.md] --- # What is Unsloth **Unsloth** is an open-source framework for **fast, memory-efficient fine-tuning and running of LLMs**. It makes training accessible on a single consumer GPU (and free Colab/Kaggle), with large speedups and lower VRAM versus standard Hugging Face training — achieved via custom Triton kernels and an optimized backprop path, with **no loss of accuracy**. ## What it does - **Fine-tuning** — LoRA / QLoRA adapters on top of base models, and full fine-tuning ([lora-and-hyperparameters](lora-and-hyperparameters.md)). - **Reinforcement learning** — GRPO and related RL to train reasoning models ([reinforcement-learning](reinforcement-learning.md)). - **Running & exporting** — run models for inference and export to GGUF/Ollama/vLLM/LM Studio ([saving-and-exporting](saving-and-exporting.md), [inference-and-deployment](inference-and-deployment.md)). - **Broad model support** — Llama, Mistral, Gemma, Qwen, DeepSeek, gpt-oss, Phi, and more ([model-catalog](../summaries/model-catalog-and-notebooks.md)). ## Why people use it - **2x+ faster training, far less VRAM** — fine-tune models that otherwise wouldn't fit. - **Beginner-friendly** — ready-to-run [notebooks](../summaries/model-catalog-and-notebooks.md) for Colab/Kaggle; you change the dataset and run. - **Drop-in** — `FastLanguageModel`/`FastModel` wrap Hugging Face + TRL, so existing training code mostly works. ## How a project flows Decide if fine-tuning fits ([fine-tuning-basics](fine-tuning-basics.md)) → install ([installation](installation.md)) → prepare a [dataset](datasets.md) → set [LoRA hyperparameters](lora-and-hyperparameters.md) → train (SFT or [RL](reinforcement-learning.md)) → [save/export](saving-and-exporting.md) → [run/deploy](inference-and-deployment.md). A newer **[Unsloth Studio](unsloth-studio.md)** web UI wraps this. Full map: [docs-catalog](../summaries/docs-catalog.md). --- title: "The unsloth Library (FastLanguageModel)" type: entity tags: [library, api, fastlanguagemodel, trl] updated: 2026-06-19 confidence: high sources: [raw/llms_txt_doc-fine-tuning-llms-guide.md, raw/llms_txt_doc-lora-fine-tuning-hyperparameters-guide.md] --- # The unsloth Library (FastLanguageModel) The `unsloth` Python package is the core API — a drop-in acceleration layer over Hugging Face Transformers + TRL. ## The typical training script 1. **Load** — `FastLanguageModel.from_pretrained(model_name, max_seq_length, load_in_4bit=...)` (or `FastModel` for vision/multimodal). This returns the model + tokenizer with Unsloth's optimizations applied. 2. **Add LoRA** — `FastLanguageModel.get_peft_model(model, r=..., lora_alpha=..., target_modules=...)` to attach [LoRA adapters](../concepts/lora-and-hyperparameters.md). 3. **Format data** — apply the [chat template](../concepts/datasets.md) to your [dataset](../concepts/datasets.md). 4. **Train** — hand the model to TRL's `SFTTrainer` (or `GRPOTrainer` for [RL](../concepts/reinforcement-learning.md)) with `TrainingArguments`; Unsloth patches make it 2x+ faster and lower-VRAM. 5. **Save/export** — `save_pretrained` (adapters), `save_pretrained_merged`, `save_pretrained_gguf`, or push to the Hub ([saving-and-exporting](../concepts/saving-and-exporting.md)). 6. **Infer** — `FastLanguageModel.for_inference(model)` for fast generation ([inference-and-deployment](../concepts/inference-and-deployment.md)). ## Why drop-in Because it wraps standard Transformers/TRL objects, existing training code mostly works — you swap the model-loading lines and keep your `SFTTrainer`/dataset code. That compatibility (plus the kernels) is Unsloth's core value. Idiomatic usage lives in the [notebooks](../summaries/model-catalog-and-notebooks.md); exact current signatures are in the source docs ([docs-catalog](../summaries/docs-catalog.md)). --- title: "Activity Log" type: log --- # Activity Log Append-only record of all wiki changes. ## Format Each entry follows this format: ``` ### YYYY-MM-DD HH:MM — [Action Type] - **Source/Trigger**: what initiated the action - **Pages created**: list of new pages - **Pages updated**: list of updated pages - **Notes**: any contradictions flagged, decisions made ``` --- ### 2026-04-08 00:00 — Setup - **Source/Trigger**: Repository initialized - **Pages created**: index.md, log.md, dashboard.md, analytics.md, flashcards.md - **Pages updated**: none - **Notes**: Empty knowledge base ready for first source ingestion ### 2026-06-19 00:00 — Initial curation (medium rung) - **Source/Trigger**: 131 docs.unsloth.ai pages (llms.txt) - **Pages created**: 9 concepts, 1 entity (unsloth-library), 2 summaries (model-catalog-and-notebooks, docs-catalog), 1 synthesis (end-to-end-fine-tune) - **Pages updated**: index.md - **Notes**: Unsloth = fast/memory-efficient LLM fine-tuning (LoRA/QLoRA, GRPO RL, GGUF/Ollama/vLLM export). Many per-model run/fine-tune guides catalogued not ingested. unsloth-studio confidence:medium (newer feature). Category: inference. --- title: "Docs Catalog" type: summary tags: [catalog, map, reference] updated: 2026-06-19 confidence: high sources: [raw/llms_txt-llms-txt-index.md, raw/llms_txt_doc-fine-tuning-llms-guide.md] --- # Docs Catalog Map of the Unsloth docs (docs.unsloth.ai llms.txt; ~130 pages mirrored in `raw/` as `llms_txt_doc-.md`). | Area | Raw slugs (selection) | Wiki coverage | |---|---|---| | Get started | fine-tuning-for-beginners, faq-is-fine-tuning-right-for-me, what-model-should-i-use-for-fine-tuning, unsloth-requirements, unsloth-notebooks, unsloth-model-catalog | [what-is-unsloth](../concepts/what-is-unsloth.md), [fine-tuning-basics](../concepts/fine-tuning-basics.md) | | Install | install-unsloth-via-pip-and-uv, install-unsloth-on-macos, install-unsloth-via-docker, windows/amd/intel/conda/google-colab/vs-code | [installation](../concepts/installation.md) | | Fine-tuning | fine-tuning-llms-guide, datasets-guide, lora-fine-tuning-hyperparameters-guide, lora-hot-swapping-guide | [datasets](../concepts/datasets.md), [lora-and-hyperparameters](../concepts/lora-and-hyperparameters.md) | | RL | reinforcement-learning-rl-guide, grpo-long-context, vision-rl, gspo, fp8-rl, rl-reward-hacking, advanced-reinforcement-learning | [reinforcement-learning](../concepts/reinforcement-learning.md) | | Saving/export | saving-to-gguf, saving-models-to-ollama, unsloth-dynamic-2-0-ggufs | [saving-and-exporting](../concepts/saving-and-exporting.md) | | Inference/deploy | inference-deployment, vllm-deployment-inference-guide, vllm-engine-arguments, deploying-models-to-lm-studio, run-with-docker/claude-code/codex/mcp, ios-android | [inference-and-deployment](../concepts/inference-and-deployment.md) | | Studio | introducing-unsloth-studio, get-started-with-unsloth-studio, unsloth-studio-installation, how-to-run/export-with-unsloth-studio | [unsloth-studio](../concepts/unsloth-studio.md) | | Per-model guides | `-how-to-run-locally`, `-fine-tune` (Llama, Qwen, Gemma, DeepSeek, GLM, Kimi, gpt-oss, Granite, Devstral, …) | [model-catalog-and-notebooks](model-catalog-and-notebooks.md) | | Troubleshooting | troubleshooting-inference | [what-is-unsloth](../concepts/what-is-unsloth.md) | ## Notes - **Per-model guides not ingested individually** — model-specific VRAM/quant/template details change per release; find the exact `*-how-to-run`/`*-fine-tune` page in `raw/`. - The docs are a GitBook with an **`ask` query** capability (HTTP GET with `?ask=`) for live Q&A. - Unsloth ships very frequently to support new models — verify version/model specifics against docs.unsloth.ai. --- title: "Model Catalog & Notebooks" type: summary tags: [models, notebooks, catalog, colab] updated: 2026-06-19 confidence: high sources: [raw/llms_txt_doc-unsloth-model-catalog.md, raw/llms_txt_doc-unsloth-notebooks.md, raw/llms_txt_doc-reinforcement-learning-rl-guide.md, raw/llms_txt_doc-unsloth-dynamic-2-0-ggufs.md] --- # Model Catalog & Notebooks Unsloth provides ready-to-run **notebooks** (the recommended starting point) and supports a large, frequently-updated **model catalog**. This page maps the space; per-model run/fine-tune guides live in `raw/` as `llms_txt_doc--how-to-run...`. ## Notebooks Open a Colab/Kaggle notebook with everything preinstalled, change the dataset, and run ([installation](../concepts/installation.md)). Categories: - **SFT fine-tuning** — per model family (Llama, Qwen, Gemma, Mistral, Phi, …). - **GRPO / RL** — reasoning training (Qwen, gpt-oss GSPO, Llama, Phi-4, DeepSeek-R1-distill, vision GRPO) ([reinforcement-learning](../concepts/reinforcement-learning.md)). - **Vision / multimodal**, **continued pretraining**, **embedding model** fine-tuning, **TTS**, and more. ## Model families covered (run + fine-tune guides) Llama, **Qwen / Qwen-Image**, **Gemma 3 / 3n / 4 (incl. QAT)**, **DeepSeek (R1, V3, OCR)**, **GLM (4.x–5.x)**, **Kimi K2**, **gpt-oss**, **IBM Granite 4**, **Devstral**, **Grok-2**, Phi, Mistral, and others. Each has a "how to run locally" and often a "fine-tune" guide. ## Dynamic GGUF uploads Unsloth publishes its own quantized models as **Dynamic 2.0 GGUFs** (better quality per byte — [saving-and-exporting](../concepts/saving-and-exporting.md)), including hard-to-quantize large models (e.g. dynamic 1.58-bit DeepSeek-R1). ## Using this catalog To fine-tune or run a specific model, find its `*-how-to-run` / `*-fine-tune` page in `raw/` for model-specific VRAM, quant, and template notes — those specifics change per model and aren't reproduced here. General workflow is in the concept pages ([what-is-unsloth](../concepts/what-is-unsloth.md)). --- title: "End-to-End: Fine-tune to Deployment" type: synthesis tags: [workflow, end-to-end, synthesis] updated: 2026-06-19 confidence: high sources: [raw/llms_txt_doc-fine-tuning-llms-guide.md, raw/llms_txt_doc-datasets-guide.md, raw/llms_txt_doc-lora-fine-tuning-hyperparameters-guide.md, raw/llms_txt_doc-saving-to-gguf.md] --- # End-to-End: Fine-tune to Deployment The complete Unsloth workflow, tying the concept pages together. ## 1. Decide & pick a model Confirm fine-tuning fits (vs RAG/prompting — [fine-tuning-basics](../concepts/fine-tuning-basics.md)). Choose a base model sized to your GPU and task. ## 2. Set up [Install](../concepts/installation.md) locally, or just open the matching [notebook](../summaries/model-catalog-and-notebooks.md) (zero setup). ## 3. Prepare data Build a clean [dataset](../concepts/datasets.md) in chat/instruction format; apply the model's **chat template**; hold out an eval slice. Quality beats quantity. ## 4. Configure Load with `FastLanguageModel` ([unsloth-library](../entities/unsloth-library.md)), add **QLoRA** (4-bit) adapters, set [hyperparameters](../concepts/lora-and-hyperparameters.md): rank/alpha, target_modules, LR ~2e-4, 1–3 epochs, effective batch size via grad-accum. ## 5. Train Run `SFTTrainer` for supervised fine-tuning, or [GRPO](../concepts/reinforcement-learning.md) for reasoning/RL (with a robust reward function — watch for reward hacking). Monitor train vs eval loss for overfitting. ## 6. Evaluate Test the fine-tune with in-framework [inference](../concepts/inference-and-deployment.md) on held-out prompts before exporting. ## 7. Save & export [Save](../concepts/saving-and-exporting.md) adapters (iterate), or merge to 16-bit/4-bit, or export **GGUF** (Dynamic 2.0 quant) / Ollama Modelfile / push to the Hub. ## 8. Deploy [Run/serve](../concepts/inference-and-deployment.md): Ollama or LM Studio (local), **vLLM** (production throughput), llama.cpp (edge/CPU). Or use the [Studio](../concepts/unsloth-studio.md) UI for the whole loop. ## Common pitfalls Wrong/missing [chat template](../concepts/datasets.md) (garbled outputs), too many epochs (overfit), LR too high (instability/loss spikes), OOM (drop to QLoRA, lower batch/seq-len, raise grad-accum), and RL [reward hacking](../concepts/reinforcement-learning.md). Model-specific gotchas: the per-model guides ([model-catalog-and-notebooks](../summaries/model-catalog-and-notebooks.md)).