# Stable Diffusion — full corpus # LLM Wiki An open-source template for building LLM-powered knowledge bases, following [Andrej Karpathy's "LLM Wiki" pattern](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f). You provide raw sources. The LLM reads them, writes structured wiki pages, cross-links everything, and maintains it over time. You never edit the wiki directly — you curate sources and ask questions. ## How It Works The system has three layers: ``` raw/ Sources you collect (articles, transcripts, notes, PDFs) wiki/ LLM-written & maintained pages (summaries, concepts, entities, syntheses) CLAUDE.md Schema that tells the LLM how to structure everything ``` Three operations drive the workflow: | Operation | Trigger | What happens | |-----------|---------|--------------| | **Ingest** | "ingest raw/my-source.txt" | LLM reads the source, creates a summary page, creates/updates concept and entity pages, adds cross-links, updates the index and log | | **Query** | Ask any question | LLM searches the wiki, synthesizes an answer with citations, optionally creates a synthesis page for novel insights | | **Lint** | "lint" or "health check" | LLM audits all pages for orphans, contradictions, missing links, incomplete sections, and low-confidence claims — fixes what it can, reports the rest | ## Quick Start 1. **Clone this repo** ```bash git clone https://github.com/YOUR_USERNAME/llm-wiki.git my-knowledge-base cd my-knowledge-base ``` 2. **Customize CLAUDE.md** for your domain - Update the Purpose section with your topic - Replace the placeholder tagging taxonomy with your own categories - Adjust confidence level descriptions if needed - Everything else (workflows, page formats, linking rules) works as-is 3. **Drop sources into `raw/`** - Text files, transcripts, articles, notes — any plain text - These are immutable once added; the LLM never modifies them 4. **Tell the LLM to ingest** ``` ingest raw/my-first-source.txt ``` The LLM will create summary pages, concept pages, entity pages, cross-links, and update the index. 5. **Ask questions** ``` What are the key differences between X and Y? ``` The LLM answers from the wiki, citing specific pages. 6. **Run health checks** ``` lint ``` The LLM audits the wiki and fixes issues. ## Directory Structure ``` . ├── CLAUDE.md # Schema — the LLM's instructions ├── raw/ # Your source documents (immutable) └── wiki/ ├── index.md # Master catalog of all pages ├── log.md # Append-only activity log ├── dashboard.md # Dataview dashboard (Obsidian) ├── analytics.md # Charts View analytics (Obsidian) ├── flashcards.md # Spaced repetition cards ├── summaries/ # One page per source document ├── concepts/ # Concept and framework pages ├── entities/ # People, tools, organizations, etc. ├── syntheses/ # Cross-cutting analyses and comparisons ├── journal/ # Research/session journal entries │ └── template.md # Journal entry template └── presentations/ # Marp slide decks ``` ## Enhancements This template includes several extras beyond the core wiki pattern: ### Dataview Dashboard (`wiki/dashboard.md`) Live queries that surface low-confidence pages, recent updates, concepts by tag, and pages with the most sources. Requires the [Dataview](https://github.com/blacksmithgu/obsidian-dataview) Obsidian plugin. ### Charts View Analytics (`wiki/analytics.md`) Visual analytics with pie charts, bar charts, and word clouds. Requires the [Charts View](https://github.com/caronchen/obsidian-chartsview-plugin) Obsidian plugin. ### Mermaid Diagrams Use Mermaid code blocks in any wiki page to create flowcharts, sequence diagrams, or concept maps. Native support in Obsidian and GitHub. ### Marp Slides (`wiki/presentations/`) Create slide decks from markdown using [Marp](https://marp.app/). Drop presentation files in this directory. ### Research Journal (`wiki/journal/`) Track your research sessions, experiments, or applied work with the included template. The LLM can reference journal entries when answering queries. ### Spaced Repetition (`wiki/flashcards.md`) Flashcards in the format used by the [Spaced Repetition](https://github.com/st3v3nmw/obsidian-spaced-repetition) Obsidian plugin. Ask the LLM to generate flashcards from any wiki page. ### MCP Server This repo works with Claude Code's MCP server capabilities. Point an MCP-compatible client at this repo and the LLM can read/write the wiki programmatically. ## Customizing for Your Domain The schema in `CLAUDE.md` is domain-agnostic. To adapt it: 1. **Purpose** — Describe your knowledge domain in one paragraph 2. **Tagging taxonomy** — Replace placeholder categories with your own (e.g., for a cooking KB: `cuisine`, `technique`, `ingredient`, `equipment`) 3. **Confidence levels** — Adjust the descriptions to match your domain's evidence standards 4. **Entity types** — Update the entity page description to match what entities mean in your domain (people, tools, companies, etc.) 5. **Journal template** — Customize `wiki/journal/template.md` for your workflow Everything else — page format, linking conventions, workflows, rules — is universal and works across domains. ## Example Domains This template works for any knowledge-intensive topic: - **Research notes** — papers, experiments, methodologies - **Book analysis** — themes, characters, author techniques - **Competitive analysis** — companies, products, market trends - **Course notes** — lectures, readings, key concepts - **Personal development** — frameworks, habits, book summaries - **Technical documentation** — APIs, architectures, design patterns - **Hobby deep-dives** — any subject you want to master ## License MIT --- title: "Stable Diffusion KB — Master Index" type: index updated: 2026-06-23 diffusers_version: "0.38.0" --- # Stable Diffusion KB — Master Index **Domain:** Stable Diffusion — open-weight latent text-to-image diffusion models (SD 1.5 / SDXL / SD3.x) and how to run, prompt, optimize, and fine-tune them. **Corpus:** 106 provenance-stamped sources in `raw/` — the Hugging Face Diffusers docs (llms.txt-curated, the de-facto SD toolkit), the AUTOMATIC1111 web UI wiki, and Stability AI / Hugging Face model cards. **Pages:** 16 (11 concepts · 2 entities · 1 summary · 2 syntheses) — the user ring plus the operator/developer ring. ## Concepts (core ideas + operational how-tos) - [[concepts/what-is-stable-diffusion]] — latent diffusion explained; the model families (SD 1.4/1.5, SD 2.x, SDXL, SD3/SD3.5) and how they differ - [[concepts/installation-and-setup]] — install `diffusers`/PyTorch, load a pipeline, first generation, device selection (CUDA/MPS) - [[concepts/text-to-image]] — `DiffusionPipeline`/`AutoPipelineForText2Image`, `guidance_scale`, `num_inference_steps`, seeds and reproducibility - [[concepts/image-to-image-and-inpainting]] — img2img, inpainting (and outpainting/depth2img), the `strength` parameter - [[concepts/prompting]] — prompt construction, negative prompts, emphasis/weighting syntax - [[concepts/sdxl]] — SDXL base+refiner two-stage, micro-conditioning, SDXL-Turbo (few-step) - [[concepts/controlnet-and-adapters]] — ControlNet, T2I-Adapter, IP-Adapter, InstructPix2Pix - [[concepts/schedulers-and-samplers]] — swapping schedulers, Karras sigmas, the step/quality tradeoff, LCM - [[concepts/loras-for-inference]] — loading and blending LoRA adapters at inference (`load_lora_weights`, `set_adapters`, `fuse_lora`) - [[concepts/optimization-and-memory]] — memory (offload/slicing/tiling) and speed (xFormers/attention backends, `torch.compile`, fp16/bf16); A1111 `--medvram`/`--xformers` - [[concepts/fine-tuning]] — LoRA vs DreamBooth vs Textual Inversion vs Custom Diffusion: what to train and how ## Entities - [[entities/diffusers-library]] — the Hugging Face `diffusers` library: `DiffusionPipeline`, models + schedulers, `from_pretrained`, AutoPipeline - [[entities/automatic1111-webui]] — AUTOMATIC1111 stable-diffusion-webui: features and key launch flags ## Summaries - [[summaries/model-and-feature-catalog]] — map of SD model versions (with Hub IDs) plus the larger Diffusers reference space this wiki maps rather than pages (other pipelines, optimization backends, quantization) ## Syntheses (decisions & casebooks) - [[syntheses/choosing-model-and-pipeline]] — which SD model (quality vs speed vs license vs VRAM) and which task pipeline - [[syntheses/troubleshooting-and-quality]] — symptom → cause → fix: CUDA OOM, black/NaN images, poor quality, non-reproducible seeds, slow generation ## Statistics - **Total pages**: 16 - **Concepts**: 11 · **Entities**: 2 · **Summaries**: 1 · **Syntheses**: 2 - **Sources ingested**: 106 (raw/, immutable) - **High confidence**: 14 · **Medium confidence**: 2 · **Low confidence**: 0 ## Coverage notes Strong: running SD with Diffusers (txt2img/img2img/inpaint/ControlNet/LoRA), schedulers, prompting, optimization, and the three main fine-tuning methods; SDXL and SD3.5 facts from model cards; A1111 web UI orientation. Spine is Diffusers `v0.38.0`; SD evolves by model release, so freshness = source fetch date (2026-06-23) and model claims are cited to their model cards. Mapped, not paged (see [[summaries/model-and-feature-catalog]]): the full Diffusers per-pipeline/per-API reference, non-SD pipelines (Kandinsky, Würstchen, video — SVD/CogVideoX), optimization backends (ONNX/OpenVINO/Core ML/MPS), and quantization (bitsandbytes/torchao/GGUF/quanto). For live model availability, licenses, and post-date releases, use the Hugging Face Hub and web search. --- title: "ControlNet and Adapters (T2I-Adapter, IP-Adapter, InstructPix2Pix)" type: concept tags: [controlnet, t2i-adapter, ip-adapter, instructpix2pix, conditioning] updated: 2026-06-23 confidence: high sources: [raw/llms_txt_doc-controlnet-2.md, raw/llms_txt_doc-controlnet.md, raw/llms_txt_doc-t2i-adapter.md, raw/llms_txt_doc-ip-adapter.md, raw/llms_txt_doc-instructpix2pix.md] --- # ControlNet and Adapters Adapters add controllable conditioning on top of a frozen base model. Use ControlNet for structural control (edges, depth, pose), T2I-Adapter as a lighter alternative, IP-Adapter for image-prompt guidance, and InstructPix2Pix for instruction-based editing. See [[concepts/text-to-image]] and [[concepts/image-to-image-and-inpainting]]. ## ControlNet A ControlNet adds "zero convolution" layers conditioned on a structural control (canny edge, depth map, human pose, etc.). Load a `ControlNetModel`, pass it to the pipeline, and weight it with `controlnet_conditioning_scale`. ```py from diffusers import StableDiffusionControlNetPipeline, ControlNetModel controlnet = ControlNetModel.from_pretrained("path/to/controlnet", torch_dtype=torch.float16) pipeline = StableDiffusionControlNetPipeline.from_pretrained( "path/to/base/model", controlnet=controlnet, torch_dtype=torch.float16).to("cuda") image = pipeline(prompt, num_inference_steps=20, image=control_image).images[0] ``` For SDXL use `StableDiffusionXLControlNetPipeline` (and `...Img2ImgPipeline` / `...InpaintPipeline`) with a control such as `diffusers/controlnet-canny-sdxl-1.0`, passing the structural image to `control_image`. **Multi-ControlNet:** pass lists of `ControlNetModel`s and scales — `StableDiffusionXLControlNetPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnets, vae=vae, ...)`, then `pipeline(prompt, image=images, controlnet_conditioning_scale=[0.5, 0.5], strength=0.7)`. **guess_mode:** `guess_mode=True` generates from only the control input, no prompt (early `DownBlock` scaled `0.1`, `MidBlock` fully `1.0`). ## T2I-Adapter A lightweight adapter (~77M params, ~300MB) that inserts weights into the UNet instead of copying it — smaller than a ControlNet. Load with `T2IAdapter`, use `StableDiffusionXLAdapterPipeline`, weight with `adapter_conditioning_scale`. ```py from diffusers import StableDiffusionXLAdapterPipeline, T2IAdapter adapter = T2IAdapter.from_pretrained("path/to/adapter", torch_dtype=torch.float16) pipeline = StableDiffusionXLAdapterPipeline.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", adapter=adapter, torch_dtype=torch.float16) ``` ## IP-Adapter A lightweight adapter (~100MB) integrating **image**-based guidance via an image encoder and new cross-attention layers. Load a base model, then `load_ip_adapter(...)`, and pass `ip_adapter_image` with the text prompt. ```py pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin") pipeline.set_ip_adapter_scale(0.8) pipeline(prompt="a polar bear...", ip_adapter_image=image).images[0] ``` `set_ip_adapter_scale()`: `1.0` conditions only on the image prompt, `0.5` balances text and image. Variants: **Plus** (patch embeddings, ViT-H encoder) and **FaceID** (InsightFace embeddings). Call `enable_model_cpu_offload()` **after** loading the IP-Adapter, else its image encoder is offloaded and errors. For multiple, pass lists of weight names + scales; combine with a ControlNet for structure or LCM for speed. ## InstructPix2Pix A Stable Diffusion model trained to edit images from instructions (e.g. "turn the clouds rainy"), conditioned on the instruction + input image. Use `StableDiffusionInstructPix2PixPipeline`, tuning `image_guidance_scale` and `guidance_scale`: ```py from diffusers import StableDiffusionInstructPix2PixPipeline pipeline = StableDiffusionInstructPix2PixPipeline.from_pretrained("your_cool_model", torch_dtype=torch.float16).to("cuda") edited_image = pipeline(prompt, image=image, num_inference_steps=20, image_guidance_scale=1.5, guidance_scale=10).images[0] ``` Related: [[concepts/loras-for-inference]], [[summaries/model-and-feature-catalog]], [[syntheses/choosing-model-and-pipeline]]. --- title: "Fine-Tuning Stable Diffusion: LoRA, DreamBooth, Textual Inversion, Custom Diffusion" type: concept tags: [fine-tuning, lora, dreambooth, textual-inversion, custom-diffusion, training] updated: 2026-06-23 confidence: high sources: [raw/llms_txt_doc-dreambooth.md, raw/llms_txt_doc-dreambooth-2.md, raw/llms_txt_doc-textual-inversion.md, raw/llms_txt_doc-textual-inversion-2.md, raw/llms_txt_doc-custom-diffusion.md, raw/llms_txt_doc-lora.md, raw/llms_txt_doc-train-a-diffusion-model.md, raw/llms_txt_doc-overview.md, raw/llms_txt_doc-stable-diffusion-xl.md] --- # Fine-Tuning Stable Diffusion: LoRA, DreamBooth, Textual Inversion, Custom Diffusion The Diffusers training scripts live in [`diffusers/examples`](https://github.com/huggingface/diffusers/tree/main/examples) — each self-contained and single-purpose, exposing the data-preprocessing code and training loop. Install from source first: ```bash git clone https://github.com/huggingface/diffusers cd diffusers pip install . cd examples/ pip install -r requirements.txt ``` Launch with `accelerate launch