# LM Studio — full corpus


<!-- ===== lmstudio/README.md ===== -->

# LLM Wiki

An open-source template for building LLM-powered knowledge bases, following [Andrej Karpathy's "LLM Wiki" pattern](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f).

You provide raw sources. The LLM reads them, writes structured wiki pages, cross-links everything, and maintains it over time. You never edit the wiki directly — you curate sources and ask questions.

## How It Works

The system has three layers:

```
raw/              Sources you collect (articles, transcripts, notes, PDFs)
wiki/             LLM-written & maintained pages (summaries, concepts, entities, syntheses)
CLAUDE.md         Schema that tells the LLM how to structure everything
```

Three operations drive the workflow:

| Operation | Trigger | What happens |
|-----------|---------|--------------|
| **Ingest** | "ingest raw/my-source.txt" | LLM reads the source, creates a summary page, creates/updates concept and entity pages, adds cross-links, updates the index and log |
| **Query** | Ask any question | LLM searches the wiki, synthesizes an answer with citations, optionally creates a synthesis page for novel insights |
| **Lint** | "lint" or "health check" | LLM audits all pages for orphans, contradictions, missing links, incomplete sections, and low-confidence claims — fixes what it can, reports the rest |

## Quick Start

1. **Clone this repo**
   ```bash
   git clone https://github.com/YOUR_USERNAME/llm-wiki.git my-knowledge-base
   cd my-knowledge-base
   ```

2. **Customize CLAUDE.md** for your domain
   - Update the Purpose section with your topic
   - Replace the placeholder tagging taxonomy with your own categories
   - Adjust confidence level descriptions if needed
   - Everything else (workflows, page formats, linking rules) works as-is

3. **Drop sources into `raw/`**
   - Text files, transcripts, articles, notes — any plain text
   - These are immutable once added; the LLM never modifies them

4. **Tell the LLM to ingest**
   ```
   ingest raw/my-first-source.txt
   ```
   The LLM will create summary pages, concept pages, entity pages, cross-links, and update the index.

5. **Ask questions**
   ```
   What are the key differences between X and Y?
   ```
   The LLM answers from the wiki, citing specific pages.

6. **Run health checks**
   ```
   lint
   ```
   The LLM audits the wiki and fixes issues.

## Directory Structure

```
.
├── CLAUDE.md                      # Schema — the LLM's instructions
├── raw/                           # Your source documents (immutable)
└── wiki/
    ├── index.md                   # Master catalog of all pages
    ├── log.md                     # Append-only activity log
    ├── dashboard.md               # Dataview dashboard (Obsidian)
    ├── analytics.md               # Charts View analytics (Obsidian)
    ├── flashcards.md              # Spaced repetition cards
    ├── summaries/                 # One page per source document
    ├── concepts/                  # Concept and framework pages
    ├── entities/                  # People, tools, organizations, etc.
    ├── syntheses/                 # Cross-cutting analyses and comparisons
    ├── journal/                   # Research/session journal entries
    │   └── template.md            # Journal entry template
    └── presentations/             # Marp slide decks
```

## Enhancements

This template includes several extras beyond the core wiki pattern:

### Dataview Dashboard (`wiki/dashboard.md`)
Live queries that surface low-confidence pages, recent updates, concepts by tag, and pages with the most sources. Requires the [Dataview](https://github.com/blacksmithgu/obsidian-dataview) Obsidian plugin.

### Charts View Analytics (`wiki/analytics.md`)
Visual analytics with pie charts, bar charts, and word clouds. Requires the [Charts View](https://github.com/caronchen/obsidian-chartsview-plugin) Obsidian plugin.

### Mermaid Diagrams
Use Mermaid code blocks in any wiki page to create flowcharts, sequence diagrams, or concept maps. Native support in Obsidian and GitHub.

### Marp Slides (`wiki/presentations/`)
Create slide decks from markdown using [Marp](https://marp.app/). Drop presentation files in this directory.

### Research Journal (`wiki/journal/`)
Track your research sessions, experiments, or applied work with the included template. The LLM can reference journal entries when answering queries.

### Spaced Repetition (`wiki/flashcards.md`)
Flashcards in the format used by the [Spaced Repetition](https://github.com/st3v3nmw/obsidian-spaced-repetition) Obsidian plugin. Ask the LLM to generate flashcards from any wiki page.

### MCP Server
This repo works with Claude Code's MCP server capabilities. Point an MCP-compatible client at this repo and the LLM can read/write the wiki programmatically.

## Customizing for Your Domain

The schema in `CLAUDE.md` is domain-agnostic. To adapt it:

1. **Purpose** — Describe your knowledge domain in one paragraph
2. **Tagging taxonomy** — Replace placeholder categories with your own (e.g., for a cooking KB: `cuisine`, `technique`, `ingredient`, `equipment`)
3. **Confidence levels** — Adjust the descriptions to match your domain's evidence standards
4. **Entity types** — Update the entity page description to match what entities mean in your domain (people, tools, companies, etc.)
5. **Journal template** — Customize `wiki/journal/template.md` for your workflow

Everything else — page format, linking conventions, workflows, rules — is universal and works across domains.

## Example Domains

This template works for any knowledge-intensive topic:

- **Research notes** — papers, experiments, methodologies
- **Book analysis** — themes, characters, author techniques
- **Competitive analysis** — companies, products, market trends
- **Course notes** — lectures, readings, key concepts
- **Personal development** — frameworks, habits, book summaries
- **Technical documentation** — APIs, architectures, design patterns
- **Hobby deep-dives** — any subject you want to master

## License

MIT


<!-- ===== lmstudio/wiki/index.md ===== -->

---
title: "LM Studio KB — Master Index"
type: index
updated: 2026-06-09
lmstudio_version: "0.4.x (llmster era)"
---

# LM Studio KB — Master Index

**Domain:** LM Studio — desktop app + headless daemon for running LLMs locally (GGUF via llama.cpp, MLX on Apple Silicon), with an OpenAI-compatible local server, `lms` CLI, and SDKs.
**Corpus:** 205 provenance-stamped sources in `raw/` (llms.txt-curated docs, docs-site crawl, repo READMEs, 58 solved bug-tracker issues).
**Pages:** 20 (11 concepts · 3 entities · 2 summaries · 4 syntheses) — core guides + the operator/developer ring

## Concepts (how-to / feature areas)

- [[concepts/install-and-setup]] — install, system requirements (Mac/Win/Linux), offline operation
- [[concepts/models-download-and-import]] — Discover tab, quantization choices (Q4+ guidance), models directory, `lms import`, deleting models
- [[concepts/local-api-server]] — Developer tab / `lms server start`, port 1234, OpenAI-compatible endpoints, native REST `/api/v0`, LAN serving
- [[concepts/headless-and-service-mode]] — llmster daemon, headless app mode, JIT loading, Idle TTL (60 min default), Auto-Evict
- [[concepts/lms-cli]] — the full command set, verbatim
- [[concepts/structured-output]] — JSON-schema-enforced responses via `/v1/chat/completions`
- [[concepts/tool-use-and-mcp]] — function calling, `.act()` agents, MCP host (`mcp.json`, ≥0.3.17), MCP security warning
- [[concepts/chat-and-documents]] — chat UI, document RAG (.docx/.pdf/.txt), presets, per-model defaults

- [[concepts/performance-and-serving]] — speculative decoding, parallel requests / continuous batching (llama.cpp v2.0.0+)
- [[concepts/presets-and-model-defaults]] — presets (0.3.15 import from file/URL), per-model load defaults (applies to `lms load` too)
- [[concepts/sdk-deep-dive]] — `.act()` execution rounds, schema-guaranteed `.respond()`, spec-decode API (SDK ≥1.2.0)

## Summaries

- [[summaries/version-history]] — which version shipped what (0.2.22 → 0.4.1, incl. Anthropic-compat /v1/messages)
- [[summaries/docs-catalog]] — map of all 205 sources / official doc areas

## Entities

- [[entities/llmster]] — the headless daemon (install one-liners, `lms daemon up`)
- [[entities/lmstudio-python]] — Python SDK (`pip install lmstudio`)
- [[entities/lmstudio-js]] — TypeScript SDK (`npm install @lmstudio/sdk`)

## Syntheses (decisions & cross-source)

- [[syntheses/deployment-modes-compared]] — GUI vs headless app vs llmster: pick by environment
- [[syntheses/api-surfaces-compared]] — OpenAI-compat vs Anthropic-compat vs native REST vs SDKs
- [[syntheses/model-runtime-casebook]] — reported patterns: Gemma-3 MLX pad/unused32, low-bit quant failures, Vulkan VRAM regression, think-block API behavior (low-confidence by design)
- [[syntheses/troubleshooting-playbook]] — solved problems from the bug tracker, symptom → cause → fix

## Statistics

- **Total pages**: 20
- **Concepts**: 11 · **Entities**: 3 · **Summaries**: 2 · **Syntheses**: 4
- **Sources ingested**: 205 (raw/, immutable)
- **Confidence**: 17 high · 2 medium · 1 low

## Coverage notes

Strong: setup, serving, CLI, headless ops, APIs, tools/MCP, common failures. Thinner (sources exist in `raw/`, pages not yet written): speculative decoding, prompt templates/tokenization, LM Link, model.yaml authoring/publishing, per-endpoint API reference. Recency boundary: sources fetched 2026-06-09.


<!-- ===== lmstudio/wiki/concepts/chat-and-documents.md ===== -->

---
title: "Chat & Documents (RAG)"
type: concept
tags: [chat, rag, documents, presets]
updated: 2026-06-09
confidence: medium
sources: [raw/docs_page-chat-with-a-model-lm-studio.md, raw/docs_page-chat-with-documents-lm-studio.md, raw/docs_page-manage-chats-lm-studio.md, raw/docs_page-config-presets-lm-studio.md, raw/docs_page-per-model-defaults-lm-studio.md]
---

# Chat & Documents (RAG)

## Chatting

Load a model in the **Chat tab** and converse. Conversations are organized as manageable threads (Manage Chats). Model behavior is controlled by load parameters, prompt template, and **Config Presets** — reusable named configurations; **per-model defaults** let each model carry its own settings.

## Chat with documents

Attach `.docx`, `.pdf`, or `.txt` files to a chat session for added context. Terminology the app uses:

- **Retrieval** — identifying the relevant portion of a long document
- **Query** — the input to retrieval
- **RAG** — Retrieval-Augmented Generation
- **Context** — the LLM's working memory, with a maximum size measured in tokens (1 token ≈ ¾ of a word)

If the document fits in context, LM Studio can include it fully; longer documents go through retrieval. All document processing is **local** — files never leave the machine ([[concepts/install-and-setup]] § Offline).

## Related

[[concepts/models-download-and-import]] · [[concepts/local-api-server]]


<!-- ===== lmstudio/wiki/concepts/headless-and-service-mode.md ===== -->

---
title: "Headless & Service Mode (llmster, JIT, TTL)"
type: concept
tags: [headless, llmster, service, jit, ttl]
updated: 2026-06-09
confidence: high
sources: [raw/docs_page-run-lm-studio-as-a-service-headless-lm-studio.md, raw/docs_page-idle-ttl-and-auto-evict-lm-studio.md, raw/docs_page-lm-studio-llmster-and-lms-lm-studio.md, raw/docs_page-setup-llmster-as-a-startup-task-on-linux-lm-studio.md]
---

# Headless & Service Mode

Two ways to run LM Studio without the GUI:

## Option 1 — llmster (recommended)

**llmster** is the core of LM Studio packaged as a standalone, server-native daemon (introduced with LM Studio 0.4.0). No GUI required at all — for Linux servers, cloud instances, GPU rigs, CI/CD.

```bash
# install — Linux / Mac
curl -fsSL https://lmstudio.ai/install.sh | bash
# install — Windows
irm https://lmstudio.ai/install.ps1 | iex

# start the daemon
lms daemon up
```

For boot-time startup on Linux, set llmster up as a startup task (systemd; see the Linux Startup Task doc).

## Option 2 — desktop app in headless mode

On machines that *have* a GUI: App Settings (`⌘/Ctrl + ,`) → check **run the LLM server on login**. Exiting the app then minimizes it to the system tray with the server still running. The last server state is restored on launch; `lms server start` achieves it programmatically.

## Just-In-Time (JIT) model loading — default: enabled

With JIT on, API calls to a model that isn't loaded **load it on demand** — clients don't need a pre-loaded model. Applies to both options.

## Idle TTL and Auto-Evict

JIT's flip side is models lingering in memory. Two controls:

- **Idle TTL** — how long a model may stay loaded with no requests before auto-unload. **Default: 60 minutes**; settable per request via a `ttl` field in the payload.
- **Auto-Evict** — unloads previously-JIT-loaded models before loading new ones, so client apps can switch models without manual unloads. **Default: enabled**; toggle in Developer tab → Server Settings.

## Related

[[concepts/local-api-server]] · [[entities/llmster]] · [[syntheses/deployment-modes-compared]] · [[syntheses/troubleshooting-playbook]] (headless Linux pitfalls)


<!-- ===== lmstudio/wiki/concepts/install-and-setup.md ===== -->

---
title: "Install & Setup"
type: concept
tags: [setup, install, requirements]
updated: 2026-06-09
confidence: high
sources: [raw/docs_page-get-started-with-lm-studio-lm-studio.md, raw/docs_page-system-requirements-lm-studio.md, raw/docs_page-offline-operation-lm-studio.md, raw/github_issue-installing-lm-studio-on-a-linux-environment-without-desktop-.md]
---

# Install & Setup

LM Studio is a desktop app for running LLMs locally. Install → download a model → load it → chat. Installers for macOS, Windows, and Linux are at lmstudio.ai/download.

## System requirements

| OS | Requirements |
|---|---|
| **macOS** | Apple Silicon only (M1/M2/M3/M4); macOS 14.0+; 16GB+ RAM recommended (8GB possible with small models + modest context). Intel Macs not supported. |
| **Windows** | x64 **or** ARM (Snapdragon X Elite). x64 CPUs require **AVX2**. 16GB+ RAM recommended; 4GB+ dedicated VRAM recommended. |
| **Linux** | x64 or ARM64 (aarch64), distributed as an **AppImage**; Ubuntu 20.04+ (versions >22 less tested). x64 builds ship with AVX2 by default. |

## First run

1. Verify your machine meets the requirements above.
2. Install the latest LM Studio.
3. **Discover tab** (`⌘/Ctrl + 2`) → download your first model (see [[concepts/models-download-and-import]]).
4. **Chat tab** → open the model loader, select the downloaded model (optionally adjust load parameters), and chat.

"Loading" a model means allocating memory (RAM/VRAM) for the model's weights.

## Headless / server installs (no GUI)

For Linux servers without a desktop, use **llmster** (the standalone daemon) instead of the AppImage — see [[concepts/headless-and-service-mode]]. The AppImage route on a GUI-less box requires extra steps (`chmod +x` the AppImage, `apt install libfuse2`) and was only community-supported.

## Offline operation

Once models are downloaded, **no internet is required**: chatting, chatting with documents (RAG), and the local server all run fully offline. Chat input and dropped documents never leave the device. Internet is only needed to search/download models and check updates.

## Related

[[concepts/models-download-and-import]] · [[concepts/local-api-server]] · [[syntheses/deployment-modes-compared]]


<!-- ===== lmstudio/wiki/concepts/lms-cli.md ===== -->

---
title: "lms — the CLI"
type: concept
tags: [cli, lms, commands]
updated: 2026-06-09
confidence: high
sources: [raw/github_doc-readme-md.md, raw/docs_page-lm-studio-llmster-and-lms-lm-studio.md, raw/docs_page-lms-get-lm-studio.md, raw/docs_page-import-models-lm-studio.md]
---

# lms — the CLI

`lms` is LM Studio's command-line interface. It drives **both** the desktop app and the llmster daemon, and ships automatically with either (LM Studio ≥0.2.22). If the command isn't on PATH, bootstrap it:

```bash
npx lmstudio install-cli
```

Then verify in a **new** terminal window with `lms`.

## Core commands (verbatim)

```bash
lms status                    # status of LM Studio
lms server start              # start the local API server
lms server stop               # stop it
lms ls                        # list downloaded models   (--json for machine-readable)
lms ps                        # list LOADED models        (--json available)
lms load <model>              # load a model
lms load <model path> -y      # load with max GPU acceleration, no confirmation
lms unload <model identifier> # unload one model
lms unload --all              # unload everything
lms get <model>               # download a model
lms import <path/to.gguf>     # import an externally-downloaded GGUF (experimental)
lms log stream                # stream logs from LM Studio
lms create                    # scaffold a new LM Studio SDK project
lms daemon up                 # start the llmster daemon (headless)
```

`lms --help` lists all subcommands; `lms <subcommand> --help` for details.

## Notes

- `lms` is built with `lmstudio.js` and lives in the [[entities/lmstudio-js]] monorepo (can't be built standalone).
- No delete-model command — remove model files from the models directory manually ([[concepts/models-download-and-import]]).

## Related

[[concepts/local-api-server]] · [[concepts/headless-and-service-mode]]


<!-- ===== lmstudio/wiki/concepts/local-api-server.md ===== -->

---
title: "Local API Server"
type: concept
tags: [server, api, openai-compatible]
updated: 2026-06-09
confidence: high
sources: [raw/docs_page-lm-studio-as-a-local-llm-api-server-lm-studio.md, raw/docs_page-openai-compatibility-endpoints-lm-studio.md, raw/docs_page-rest-api-v0-lm-studio.md, raw/github_issue-how-to-start-a-serve-on-a-local-network-with-cli.md]
---

# Local API Server

LM Studio serves local models over HTTP — on `localhost` or your network — from the **Developer tab** ("Start server" toggle) or the CLI:

```
lms server start
```

**Default port: 1234.** Base URL: `http://localhost:1234/v1` (OpenAI-compatible).

## API surfaces (see [[syntheses/api-surfaces-compared]])

- **OpenAI-compatible endpoints** — drop-in for existing OpenAI clients.
- **Anthropic-compatible endpoints** — for Anthropic-format clients.
- **LM Studio REST API** (native, `/api/v0`, v1 recommended for new projects) — richer stats.
- **SDKs** — [[entities/lmstudio-python]] and [[entities/lmstudio-js]].

## OpenAI-compatible endpoints

| Endpoint | Method |
|---|---|
| `/v1/models` | GET |
| `/v1/responses` | POST |
| `/v1/chat/completions` | POST |
| `/v1/embeddings` | POST |
| `/v1/completions` | POST |

Reuse any OpenAI client by switching its base URL:

```python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:1234/v1")
```

Set `model` to the model identifier shown in LM Studio. OpenAI **Codex** works against LM Studio because `/v1/responses` is implemented.

## Native REST API (enhanced stats)

`/api/v0` (LM Studio ≥0.3.6; a **v1 REST API now exists and is recommended** for new projects): `GET /api/v0/models`, `GET /api/v0/models/{model}`, `POST /api/v0/chat/completions`, `POST /api/v0/completions`, `POST /api/v0/embeddings`. Adds tokens/sec, time-to-first-token (TTFT), loaded/unloaded state, max context, and quantization info.

## Serving on the local network

Enable "Serve on Local Network" in the GUI server settings — after that, `lms server start` serves network-wide. Without the GUI, edit `~/.lmstudio/.internal/http-server-config.json` (fallback `~/.cache/lm-studio/.internal/http-server-config.json`) and set `"networkInterface": "0.0.0.0"` — noting the maintainers say config file formats are **not a stable interface**.

## Related

[[concepts/headless-and-service-mode]] (JIT loading, TTL) · [[concepts/structured-output]] · [[concepts/tool-use-and-mcp]]


<!-- ===== lmstudio/wiki/concepts/models-download-and-import.md ===== -->

---
title: "Models — Download, Quantization & Import"
type: concept
tags: [models, gguf, quantization, huggingface]
updated: 2026-06-09
confidence: high
sources: [raw/docs_page-download-an-llm-lm-studio.md, raw/docs_page-import-models-lm-studio.md, raw/github_issue-how-do-i-remove-a-model-using-the-cli.md]
---

# Models — Download, Quantization & Import

## Downloading (Discover tab)

LM Studio's built-in downloader pulls any supported model from **Hugging Face**. Search by keyword (`llama`, `gemma`), by `user/model` string, or paste a full Hugging Face URL. Jump to Discover from anywhere with `⌘ + 2` (Mac) / `Ctrl + 2` (Win/Linux).

## Choosing a quantization

Download options like `Q3_K_S`, `Q_8` are the **same model at different fidelity**. `Q` = quantization: compressing the model, trading some quality for size. Official guidance: **choose a 4-bit option or higher** if your machine can run it.

## Models directory & layout

Default location: `~/.lmstudio/models/`, preserving the Hugging Face structure:

```
~/.lmstudio/models/
└── publisher/
    └── model/
        └── model-file.gguf
```

Change the directory from **My Models**. 

## Importing models downloaded elsewhere

- `lms import <path/to/model.gguf>` (experimental) — interactive import of a GGUF file.
- Or place files manually in the directory structure above (publisher/model/file).

## Removing a model

There is no dedicated CLI delete command — go to the models directory and **delete the model's files** like any file (per maintainer guidance). `lms ls` lists what's downloaded.

## Related

[[concepts/install-and-setup]] · [[concepts/lms-cli]] · [[syntheses/troubleshooting-playbook]] (load failures, checksum errors)


<!-- ===== lmstudio/wiki/concepts/performance-and-serving.md ===== -->

---
title: "Performance & Serving Features — Speculative Decoding, Parallel Requests"
type: concept
tags: [performance, serving, batching, speculative-decoding]
updated: 2026-06-09
confidence: high
sources: [raw/docs_page-speculative-decoding-lm-studio.md, raw/docs_page-speculative-decoding-lm-studio-2.md, raw/docs_page-parallel-requests-lm-studio.md]
---

# Performance & Serving Features

The throughput/latency levers for serving from LM Studio.

## Speculative decoding

A **draft model** speeds up a large model's generation **without reducing response quality**. Available in the app's model config and programmatically (Python SDK requires **≥1.2.0** for the draft-model API).

## Parallel requests (continuous batching)

Set **Max Concurrent Predictions** when loading a model to process requests **in parallel instead of queued** — the server dynamically combines requests into a single batch for higher throughput. Requirements (official): supported on the **llama.cpp engine** (MLX "coming soon"); GGUF runtime must be **llama.cpp v2.0.0+**.

## Related

[[concepts/local-api-server]] · [[concepts/headless-and-service-mode]] (JIT/TTL interplay)


<!-- ===== lmstudio/wiki/concepts/presets-and-model-defaults.md ===== -->

---
title: "Config Presets & Per-Model Defaults"
type: concept
tags: [presets, configuration, defaults]
updated: 2026-06-09
confidence: high
sources: [raw/docs_page-config-presets-lm-studio.md, raw/docs_page-per-model-defaults-lm-studio.md, raw/docs_page-configuring-the-model-lm-studio.md]
---

# Config Presets & Per-Model Defaults

The two configuration surfaces that make model behavior reproducible.

## Presets

Bundle a **system prompt + parameters** into a named, reusable configuration applied across chats. Since **0.3.15**: import Presets **from file or URL** (and share yours — see the publish-your-presets doc in raw/).

## Per-model defaults

Set **default load settings per model** — applied whenever that model loads anywhere in the app, **including `lms load`** from the CLI. This is the right place for per-model context length, GPU offload, and template choices instead of re-configuring per session.

## Related

[[concepts/chat-and-documents]] · [[concepts/lms-cli]]


<!-- ===== lmstudio/wiki/concepts/sdk-deep-dive.md ===== -->

---
title: "SDK Deep Dive — .act() Agents, Structured Response, Spec-Decode API"
type: concept
tags: [sdk, agents, structured-output, python, typescript]
updated: 2026-06-09
confidence: high
sources: [raw/docs_page-the-act-call-lm-studio.md, raw/docs_page-structured-response-lm-studio.md, raw/docs_page-speculative-decoding-lm-studio-2.md, raw/docs_page-using-lmstudio-python-in-repl-lm-studio.md]
---

# SDK Deep Dive

Operational detail behind [[entities/lmstudio-python]] / [[entities/lmstudio-js]].

## `.act()` — autonomous agents in execution rounds

The SDKs model agentic tool use as **execution rounds**: *run a tool → feed its output to the LLM → the LLM decides the next step* — repeated until the task completes. `.act()` runs that loop automatically against your locally-defined tools ([[concepts/tool-use-and-mcp]]).

## Structured response — `.respond()` with a schema

Pass a JSON schema (or a **Pydantic model** in Python / zod in TS) to `.respond()` — the output is **guaranteed to conform** to the schema. The SDK-level counterpart of the server's structured output ([[concepts/structured-output]]).

## Speculative decoding API

Attach a draft model programmatically — Python SDK **≥1.2.0** ([[concepts/performance-and-serving]]).

## Interactive use

lmstudio-python has a documented REPL workflow (convenience API creates the default `Client` implicitly).


<!-- ===== lmstudio/wiki/concepts/structured-output.md ===== -->

---
title: "Structured Output (JSON Schema)"
type: concept
tags: [structured-output, json-schema, api]
updated: 2026-06-09
confidence: high
sources: [raw/docs_page-structured-output-lm-studio.md, raw/docs_page-structured-response-lm-studio.md, raw/github_issue-structured-output-broken-in-0-3-5.md]
---

# Structured Output (JSON Schema)

Enforce a response format by providing a **JSON schema** to `/v1/chat/completions` — the LLM then responds in valid JSON conforming to the schema. Works through LM Studio's server via **any OpenAI client** (it follows the same format as OpenAI's Structured Output API).

## Setup

```bash
lms server start          # or Developer tab → Start server
```

Then send a `response_format` with your schema (OpenAI format) to `http://localhost:1234/v1/chat/completions` — via curl or the OpenAI SDKs.

## In the SDKs

Both SDKs accept schema objects natively: **pydantic** models in [[entities/lmstudio-python]], **zod** schemas in [[entities/lmstudio-js]] (see each SDK's structured-response docs in raw/).

## Caveats

- Schema adherence depends on the model — small/heavily-quantized models may struggle with complex schemas.
- Regression history exists (structured output broke in 0.3.5 and was fixed) — if schemas stop being honored after an update, check the release notes/bug tracker first.

## Related

[[concepts/local-api-server]] · [[concepts/tool-use-and-mcp]]


<!-- ===== lmstudio/wiki/concepts/tool-use-and-mcp.md ===== -->

---
title: "Tool Use & MCP"
type: concept
tags: [tools, function-calling, mcp]
updated: 2026-06-09
confidence: high
sources: [raw/docs_page-tool-use-lm-studio.md, raw/docs_page-tool-definition-lm-studio.md, raw/docs_page-use-mcp-servers-lm-studio.md, raw/docs_page-using-mcp-via-api-lm-studio.md, raw/docs_page-the-act-call-lm-studio.md]
---

# Tool Use & MCP

## Tool use (function calling) via the API

LLMs can request calls to external functions through `/v1/chat/completions` and `/v1/responses` (OpenAI tool format — works with any OpenAI client). Start the server (`lms server start`), define tools in the request, and handle the model's tool-call responses in your code. The SDKs go further: lmstudio-python/js's **`.act()` call** runs an agentic loop that executes your tools for multiple rounds until the task completes.

## MCP servers in the app (≥0.3.17)

LM Studio is an **MCP Host**: connect local *and* remote MCP servers (≥0.3.17 b10) and their tools become available to your local models.

- **Install/edit:** Program tab (right sidebar) → `Install > Edit mcp.json` — LM Studio follows **Cursor's `mcp.json` notation**. Remote example:

```json
{
  "mcpServers": {
    "hf-mcp-server": { "url": "https://huggingface.co/mcp" }
  }
}
```

- Or use the **"Add to LM Studio" button** where sites provide it.

**Security (official warning):** some MCP servers can run arbitrary code, access local files, and use your network. **Never install MCPs from untrusted sources.**

## MCP via the API

MCP tools can also be exercised through the server API (see `raw/docs_page-using-mcp-via-api-lm-studio.md`) — so programmatic clients get the same tool ecosystem the app does.

## Related

[[concepts/structured-output]] · [[concepts/local-api-server]] · [[entities/lmstudio-python]] · [[entities/lmstudio-js]]


<!-- ===== lmstudio/wiki/entities/llmster.md ===== -->

---
title: "llmster"
type: entity
tags: [llmster, daemon, headless]
updated: 2026-06-09
confidence: high
sources: [raw/docs_page-lm-studio-llmster-and-lms-lm-studio.md, raw/docs_page-run-lm-studio-as-a-service-headless-lm-studio.md, raw/docs_page-setup-llmster-as-a-startup-task-on-linux-lm-studio.md]
---

# llmster

LM Studio's **headless daemon** — the core of the desktop app packaged as a standalone background service with no GUI dependency (introduced with LM Studio 0.4.0). Full model-serving capability on Linux servers, cloud instances, GPU rigs, CI/CD, or as a boot service — without installing the desktop app at all.

## Install & run (verbatim)

```bash
curl -fsSL https://lmstudio.ai/install.sh | bash    # Linux / Mac
irm https://lmstudio.ai/install.ps1 | iex            # Windows
lms daemon up                                        # start the daemon
```

Boot-time startup on Linux: configure as a startup task (systemd; Linux Startup Task doc).

## Relationship to the other tools

- **LM Studio (desktop app)** — full GUI: model discovery, chat, RAG, server, MCP.
- **llmster** — the same serving core, GUI-less; recommended path for service/headless use.
- **`lms` CLI** — drives *both*; ships with either.

See [[syntheses/deployment-modes-compared]] for when to use which, and [[concepts/headless-and-service-mode]] for JIT/TTL behavior that applies to daemon serving.


<!-- ===== lmstudio/wiki/entities/lmstudio-js.md ===== -->

---
title: "lmstudio-js (TypeScript SDK)"
type: entity
tags: [sdk, typescript, javascript]
updated: 2026-06-09
confidence: high
sources: [raw/github_doc-readme-md-3.md, raw/docs_page-lmstudio-js-typescript-sdk-lm-studio.md, raw/github_doc-readme-md.md]
---

# lmstudio-js (TypeScript SDK)

LM Studio's official JavaScript/TypeScript client SDK.

```bash
npm install @lmstudio/sdk --save
```

```ts
import { LMStudioClient } from "@lmstudio/sdk";
const client = new LMStudioClient();
```

## What it does (per the README)

- Chat responses and text completions from local LLMs
- **Define functions as tools and turn LLMs into autonomous agents** (`.act()`) running fully locally
- Load, configure, and unload models from memory
- Generate text embeddings
- Works in **browser and any Node-compatible environment**

## Notes

- The `lms` CLI is built with lmstudio-js and lives in the same monorepo (the CLI cannot be built standalone) — [[concepts/lms-cli]].
- Structured responses use **zod** schemas ([[concepts/structured-output]]).

Repo: github.com/lmstudio-ai/lmstudio-js. Python counterpart: [[entities/lmstudio-python]].


<!-- ===== lmstudio/wiki/entities/lmstudio-python.md ===== -->

---
title: "lmstudio-python (Python SDK)"
type: entity
tags: [sdk, python]
updated: 2026-06-09
confidence: high
sources: [raw/github_doc-readme-md-2.md, raw/docs_page-lmstudio-python-python-sdk-lm-studio.md, raw/docs_page-using-lmstudio-python-in-repl-lm-studio.md]
---

# lmstudio-python (Python SDK)

LM Studio's official Python SDK.

```console
$ pip install lmstudio
```

## Shape of the API

- The base component is the synchronous **`Client`** — create once; it manages the underlying **websocket** connections to the LM Studio instance.
- A **top-level convenience API** exists for interactive/REPL use (implicitly creates a default `Client`).

## What it does

Chat/completions against local models, model load/unload and configuration, embeddings, structured responses (pydantic schemas), tool definition, and the agentic **`.act()`** loop ([[concepts/tool-use-and-mcp]]).

Repo: github.com/lmstudio-ai/lmstudio-python. TypeScript counterpart: [[entities/lmstudio-js]].


<!-- ===== lmstudio/wiki/log.md ===== -->

---
title: "Activity Log"
type: log
---

# Activity Log

Append-only record of all wiki changes.

## Format

Each entry follows this format:
```
### YYYY-MM-DD HH:MM — [Action Type]
- **Source/Trigger**: what initiated the action
- **Pages created**: list of new pages
- **Pages updated**: list of updated pages
- **Notes**: any contradictions flagged, decisions made
```

---

### 2026-04-08 00:00 — Setup

- **Source/Trigger**: Repository initialized
- **Pages created**: index.md, log.md, dashboard.md, analytics.md, flashcards.md
- **Pages updated**: none
- **Notes**: Empty knowledge base ready for first source ingestion

---

## 2026-06-10 — removed Obsidian scaffolding from the served wiki

Deleted `analytics.md`, `dashboard.md`, `flashcards.md` (Obsidian plugin pages — Dataview/Charts View/Spaced Repetition markup, unusable when served as plain Markdown to agents) and the `journal/` scaffold (template only). `CLAUDE.md` directory layout updated: production/planning material lives at repo root, never under `wiki/` (everything under `wiki/` is served publicly).


<!-- ===== lmstudio/wiki/summaries/docs-catalog.md ===== -->

---
title: "Docs Catalog — What the Official Documentation Covers"
type: summary
tags: [catalog, docs, map]
updated: 2026-06-09
confidence: high
sources: [raw/llms_txt-llms-txt-index.md, raw/docs_page-welcome-to-lm-studio-docs-lm-studio.md, raw/docs_page-lm-studio-developer-docs-lm-studio.md]
---

# Docs Catalog — Map of the Official Documentation

LM Studio publishes a maintainer-curated **llms.txt** index; this KB's `raw/` holds **205 sources** from it plus the docs crawl and trackers. The map of what exists (so "does a doc for X exist?" is answerable even where this wiki has no page yet):

| Area | What's documented (in raw/) |
|---|---|
| **App basics** | install, system requirements, offline mode, chat, RAG/documents, presets, per-model defaults, color themes, localization |
| **Model management** | download, import, directory layout, JIT/TTL/auto-evict, memory management |
| **Server & APIs** | OpenAI-compat (incl. `/v1/responses`), Anthropic-compat, native REST v0/v1, authentication, streaming events, parallel requests, idle TTL |
| **API operations** | per-endpoint pages: chat/text completions, embeddings, models list/info, load/unload, tokenization, cancel predictions, get context length/load config/download status |
| **SDKs** | lmstudio-python + lmstudio-js: chats (stateful), structured response, tool definition, `.act()`, image input, embedding, REPL |
| **CLI** | `lms` command set, `lms get` |
| **Ops** | headless/llmster, Linux startup task, LM Link (remote access), proxy |
| **Ecosystem** | integrations page, "Add to LM Studio" button, model.yaml authoring/publishing, Hub push/pull |

Thinner in this wiki (sources exist; pages pending): model.yaml authoring, LM Link, Hub publishing, per-endpoint API reference detail.


<!-- ===== lmstudio/wiki/summaries/version-history.md ===== -->

---
title: "Version History — Feature Timeline"
type: summary
tags: [versions, changelog, releases]
updated: 2026-06-09
confidence: high
sources: [raw/docs_page-api-changelog-lm-studio.md, raw/llms_txt_doc-0-3-5.md, raw/github_doc-readme-md.md, raw/docs_page-use-mcp-servers-lm-studio.md, raw/docs_page-config-presets-lm-studio.md, raw/docs_page-rest-api-v0-lm-studio.md]
---

# Version History — Feature Timeline

Which version introduced what (compiled from the official API changelog, blog posts, and per-feature docs). Useful for "why doesn't my install have X".

| Version | Shipped |
|---|---|
| 0.2.22 | `lms` CLI ships with the app |
| 0.3.5 | **headless mode + on-demand (JIT) model loading**; mlx-engine Pixtral support |
| 0.3.6 | native REST API `/api/v0` |
| 0.3.15 | Preset import from file/URL |
| 0.3.17 (b10) | **MCP Host** — local + remote MCP servers |
| 0.4.0 | **llmster daemon**; **native v1 REST API (`/api/v1/*`)**; MCP via API; stateful chats; authentication |
| 0.4.1 | **Anthropic-compatible `POST /v1/messages`** — use Claude-Code-style clients against LM Studio |

Pattern: API surfaces accrete fast (v0 → v1 REST, OpenAI-compat → Anthropic-compat) — when an endpoint 404s, check this table against your app version first ([[concepts/local-api-server]]).


<!-- ===== lmstudio/wiki/syntheses/api-surfaces-compared.md ===== -->

---
title: "API Surfaces Compared — Which Interface Should You Use?"
type: synthesis
tags: [api, openai-compatible, sdk, decision]
updated: 2026-06-09
confidence: high
sources: [raw/docs_page-lm-studio-as-a-local-llm-api-server-lm-studio.md, raw/docs_page-openai-compatibility-endpoints-lm-studio.md, raw/docs_page-rest-api-v0-lm-studio.md, raw/docs_page-anthropic-compatibility-endpoints-lm-studio.md, raw/github_doc-readme-md-2.md, raw/github_doc-readme-md-3.md]
---

# API Surfaces Compared

One server (`lms server start`, port **1234**), five ways in. **Decision rule: existing OpenAI/Anthropic code → compatibility endpoints; new app code → SDK; ops/monitoring → native REST.**

| Surface | Base | Use when |
|---|---|---|
| **OpenAI-compatible** | `http://localhost:1234/v1` | You have existing OpenAI-client code/tools — change only the base URL. Covers `models`, `responses`, `chat/completions`, `embeddings`, `completions`. Codex works (via `/v1/responses`). |
| **Anthropic-compatible** | (Anthropic-format endpoints) | Your client speaks Anthropic's API format. |
| **Native REST** | `/api/v0` (v1 recommended for new projects) | You want what compat APIs lack: tokens/sec, TTFT, loaded/unloaded state, max context, quantization metadata. |
| **[[entities/lmstudio-python]]** | websocket via `Client` | Python apps: model management + prediction + `.act()` agents in-process. |
| **[[entities/lmstudio-js]]** | `@lmstudio/sdk` | TS/JS (Node *or browser*): same, plus the CLI is built on it. |

## Rules of thumb

1. **Migrating an existing integration** → compatibility endpoint, one-line base-URL change.
2. **Building new** → SDK: model load/unload control, structured responses (pydantic/zod), tools/`.act()` — things raw HTTP makes you hand-roll.
3. **Observability/ops** → native REST stats endpoints.
4. All surfaces serve the same models and honor JIT/TTL/Auto-Evict ([[concepts/headless-and-service-mode]]).

## Related

[[concepts/local-api-server]] · [[concepts/structured-output]] · [[concepts/tool-use-and-mcp]]


<!-- ===== lmstudio/wiki/syntheses/deployment-modes-compared.md ===== -->

---
title: "Deployment Modes Compared — GUI vs Headless App vs llmster"
type: synthesis
tags: [deployment, headless, llmster, decision]
updated: 2026-06-09
confidence: high
sources: [raw/docs_page-lm-studio-llmster-and-lms-lm-studio.md, raw/docs_page-run-lm-studio-as-a-service-headless-lm-studio.md, raw/github_issue-installing-lm-studio-on-a-linux-environment-without-desktop-.md, raw/github_issue-macos-server-not-working-without-a-user-session-windowserver.md]
---

# Deployment Modes Compared

Three ways to run LM Studio. **Decision rule: GUI for humans, llmster for servers, headless-app mode only when the GUI machine doubles as the server.**

| | Desktop app (GUI) | Desktop app, headless mode | llmster daemon |
|---|---|---|---|
| Needs a display/GUI | yes | yes (app installed, UI hidden) | **no** |
| Best for | interactive use: discover models, chat, RAG, MCP | "my workstation is also the LLM server" | Linux servers, cloud, GPU rigs, CI/CD, boot services |
| Start | app | App Settings → run server on login (tray) | `lms daemon up` |
| Install | installer / AppImage | same | `curl -fsSL https://lmstudio.ai/install.sh \| bash` |

## Pick the desktop app if…
You want the full experience: Discover-tab downloads, chat UI, document RAG, presets, MCP host. Easiest start.

## Pick headless app mode if…
The machine *has* a GUI and you want the server surviving app exit: enable run-on-login; the app minimizes to tray with the server running. Server state restores on launch.

## Pick llmster if…
No display exists or you want a clean service: it's the app's serving core, server-native. Don't fight the AppImage on GUI-less Linux (community workaround needed `libfuse2` + manual steps) — llmster is the supported path. On macOS, running the *app's* server without a user session hits WindowServer problems (bug tracker) — another reason the daemon exists.

## Applies to all server modes
JIT on-demand loading, Idle TTL (default 60 min), Auto-Evict — [[concepts/headless-and-service-mode]]. The `lms` CLI drives all three — [[concepts/lms-cli]].


<!-- ===== lmstudio/wiki/syntheses/model-runtime-casebook.md ===== -->

---
title: "Model & Runtime Casebook — Reported Patterns from the Bug Tracker"
type: synthesis
tags: [troubleshooting, models, mlx, vulkan, compatibility]
updated: 2026-06-09
confidence: low
sources: [raw/github_issue-gemma-3-with-mlx-only-generates-pad-pad-pad.md, raw/github_issue-gemma3-only-responding-with-unused32.md, raw/github_issue-failed-run-qwq-32b-3bit-and-4bit-model.md, raw/github_issue-regression-vulkan-v1-52-0-does-not-use-gpu-vram-on-gfx1151-r.md, raw/github_issue-strange-characters-in-output-after-update-0-3-6-in-lm-studio.md, raw/github_issue-think-block-missing-in-api-response-lm-studio-0-3-22-qwen3-t.md, raw/github_issue-qwen3vl-4b-hallucinates-when-running-on-lmstudio.md, raw/github_issue-just-in-time-loading-not-working.md, raw/github_issue-lm-studio-couldn-t-detect-gpu.md]
---

# Model & Runtime Casebook — Reported Patterns

**Confidence: these are *reported patterns* from the tracker, not verified fixes** — value is knowing you're not alone and which thread to check. (Solved cases with clear fixes live in [[syntheses/troubleshooting-playbook]].)

## Model-specific output corruption (reported)

- **Gemma 3 on MLX emitting only `<pad>` tokens**, and **Gemma 3 replying `<unused32>`** (intermittent; reported on 0.3.13/llama.cpp v1.20.1) — model+runtime version combos matter; check the thread for the version where it cleared.
- **QwQ-32B failing at 3-bit/4-bit quants** — low-bit quant failures are model-specific; try a higher-bit quant.
- **Qwen3-VL hallucinating on low-res/blurry input images** — maintainers' first ask was image quality; feed higher-resolution images before suspecting the model.
- **Strange characters in output after the 0.3.6 update** — update-correlated; maintainers triage via `Ctrl+Shift+R` (runtime manager) screenshots: runtime/app version mismatch territory.

## API behavior (reported)

- **`think` block missing from API responses** (Qwen3 thinking models, 0.3.22) — reasoning content separation has version-specific behavior; see the API changelog's `reasoningContent` change ([[summaries/version-history]]).

## Runtime/GPU (reported)

- **Vulkan v1.52.0 regression: GPU VRAM unused on gfx1151 (Ryzen AI)** — runtime-version-specific; pin/downgrade the runtime in the runtime manager.
- **GPU not detected at all** — see the dedicated issue; first checks are driver + runtime selection (`Ctrl+Shift+R`).
- **JIT loading "not working"** — needs **beta 4+** of the relevant release (per the thread's resolution); JIT is also tied to server settings ([[concepts/headless-and-service-mode]]).


<!-- ===== lmstudio/wiki/syntheses/troubleshooting-playbook.md ===== -->

---
title: "Troubleshooting Playbook — Solved Problems from the Bug Tracker"
type: synthesis
tags: [troubleshooting, errors, gpu, loading]
updated: 2026-06-09
confidence: medium
sources: [raw/github_issue-no-lm-runtime-found-for-model-format-gguf.md, raw/github_issue-linux-exit-code-133-error-when-loading-large-llm-models.md, raw/github_issue-how-to-start-a-serve-on-a-local-network-with-cli.md, raw/github_issue-installing-lm-studio-on-a-linux-environment-without-desktop-.md, raw/github_issue-how-do-i-remove-a-model-using-the-cli.md, raw/github_issue-fixed-in-0-3-30b2-search-crash-oops-sorry-an-unexpected-erro.md]
---

# Troubleshooting Playbook

Symptom → cause → fix, distilled from *solved* issues in the official bug tracker. (Each entry cites its issue in `sources`; ~58 solved issues live in `raw/github_issue-*` for deeper digging.)

## "No LM Runtime found for model format 'gguf'!"
**Cause:** the LM Runtime (llama.cpp engine) isn't installed/loaded — seen on fresh Linux installs (reported on 0.3.6/Arch).
**Fix:** open the runtimes manager — `Ctrl + Shift + R` (Win/Linux) / `⌘ + Shift + R` (Mac) — and install/update the GGUF (llama.cpp) runtime; re-download a runtime rather than the model.

## Large model fails to load on Linux with "Exit code: 133" (RAM/VRAM are sufficient)
**Cause (investigated in the issue):** Electron's PartitionAlloc rejects the huge `posix_memalign` allocation — same model loads fine in plain llama.cpp; upstream Electron issue.
**Fix/workaround:** track the upstream fix; reduce context size / use a smaller quant to shrink the allocation, or serve via a non-Electron path.

## Server only reachable on localhost; need LAN access from the CLI
**Fix:** enable "Serve on Local Network" once in the GUI, after which `lms server start` serves the network. GUI-less: set `"networkInterface": "0.0.0.0"` in `~/.lmstudio/.internal/http-server-config.json` (fallback: `~/.cache/lm-studio/.internal/http-server-config.json`). Maintainers warn config-file formats aren't a stable interface.

## Installing on GUI-less Linux
**Fix:** prefer **llmster** (`curl -fsSL https://lmstudio.ai/install.sh | bash`, then `lms daemon up`). The community AppImage route needs `chmod +x` + `apt install libfuse2` and more steps — [[syntheses/deployment-modes-compared]].

## "How do I delete a model from the CLI?"
**Answer (maintainer):** there's no CLI delete — remove the model's files from the models directory (`~/.lmstudio/models/publisher/model/`) like any file.

## Search crashes / "Oops! an unexpected error has occurred"
**Pattern:** version-specific regressions (this one fixed in 0.3.30b2). **Rule:** on weird breakage right after an update, check the bug tracker for your exact version before debugging locally.

## General diagnostics

```bash
lms status        # is LM Studio/llmster up?
lms ps            # what's actually loaded?
lms log stream    # live logs while reproducing the problem
```