# LM Studio — full corpus # LLM Wiki An open-source template for building LLM-powered knowledge bases, following [Andrej Karpathy's "LLM Wiki" pattern](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f). You provide raw sources. The LLM reads them, writes structured wiki pages, cross-links everything, and maintains it over time. You never edit the wiki directly — you curate sources and ask questions. ## How It Works The system has three layers: ``` raw/ Sources you collect (articles, transcripts, notes, PDFs) wiki/ LLM-written & maintained pages (summaries, concepts, entities, syntheses) CLAUDE.md Schema that tells the LLM how to structure everything ``` Three operations drive the workflow: | Operation | Trigger | What happens | |-----------|---------|--------------| | **Ingest** | "ingest raw/my-source.txt" | LLM reads the source, creates a summary page, creates/updates concept and entity pages, adds cross-links, updates the index and log | | **Query** | Ask any question | LLM searches the wiki, synthesizes an answer with citations, optionally creates a synthesis page for novel insights | | **Lint** | "lint" or "health check" | LLM audits all pages for orphans, contradictions, missing links, incomplete sections, and low-confidence claims — fixes what it can, reports the rest | ## Quick Start 1. **Clone this repo** ```bash git clone https://github.com/YOUR_USERNAME/llm-wiki.git my-knowledge-base cd my-knowledge-base ``` 2. **Customize CLAUDE.md** for your domain - Update the Purpose section with your topic - Replace the placeholder tagging taxonomy with your own categories - Adjust confidence level descriptions if needed - Everything else (workflows, page formats, linking rules) works as-is 3. **Drop sources into `raw/`** - Text files, transcripts, articles, notes — any plain text - These are immutable once added; the LLM never modifies them 4. **Tell the LLM to ingest** ``` ingest raw/my-first-source.txt ``` The LLM will create summary pages, concept pages, entity pages, cross-links, and update the index. 5. **Ask questions** ``` What are the key differences between X and Y? ``` The LLM answers from the wiki, citing specific pages. 6. **Run health checks** ``` lint ``` The LLM audits the wiki and fixes issues. ## Directory Structure ``` . ├── CLAUDE.md # Schema — the LLM's instructions ├── raw/ # Your source documents (immutable) └── wiki/ ├── index.md # Master catalog of all pages ├── log.md # Append-only activity log ├── dashboard.md # Dataview dashboard (Obsidian) ├── analytics.md # Charts View analytics (Obsidian) ├── flashcards.md # Spaced repetition cards ├── summaries/ # One page per source document ├── concepts/ # Concept and framework pages ├── entities/ # People, tools, organizations, etc. ├── syntheses/ # Cross-cutting analyses and comparisons ├── journal/ # Research/session journal entries │ └── template.md # Journal entry template └── presentations/ # Marp slide decks ``` ## Enhancements This template includes several extras beyond the core wiki pattern: ### Dataview Dashboard (`wiki/dashboard.md`) Live queries that surface low-confidence pages, recent updates, concepts by tag, and pages with the most sources. Requires the [Dataview](https://github.com/blacksmithgu/obsidian-dataview) Obsidian plugin. ### Charts View Analytics (`wiki/analytics.md`) Visual analytics with pie charts, bar charts, and word clouds. Requires the [Charts View](https://github.com/caronchen/obsidian-chartsview-plugin) Obsidian plugin. ### Mermaid Diagrams Use Mermaid code blocks in any wiki page to create flowcharts, sequence diagrams, or concept maps. Native support in Obsidian and GitHub. ### Marp Slides (`wiki/presentations/`) Create slide decks from markdown using [Marp](https://marp.app/). Drop presentation files in this directory. ### Research Journal (`wiki/journal/`) Track your research sessions, experiments, or applied work with the included template. The LLM can reference journal entries when answering queries. ### Spaced Repetition (`wiki/flashcards.md`) Flashcards in the format used by the [Spaced Repetition](https://github.com/st3v3nmw/obsidian-spaced-repetition) Obsidian plugin. Ask the LLM to generate flashcards from any wiki page. ### MCP Server This repo works with Claude Code's MCP server capabilities. Point an MCP-compatible client at this repo and the LLM can read/write the wiki programmatically. ## Customizing for Your Domain The schema in `CLAUDE.md` is domain-agnostic. To adapt it: 1. **Purpose** — Describe your knowledge domain in one paragraph 2. **Tagging taxonomy** — Replace placeholder categories with your own (e.g., for a cooking KB: `cuisine`, `technique`, `ingredient`, `equipment`) 3. **Confidence levels** — Adjust the descriptions to match your domain's evidence standards 4. **Entity types** — Update the entity page description to match what entities mean in your domain (people, tools, companies, etc.) 5. **Journal template** — Customize `wiki/journal/template.md` for your workflow Everything else — page format, linking conventions, workflows, rules — is universal and works across domains. ## Example Domains This template works for any knowledge-intensive topic: - **Research notes** — papers, experiments, methodologies - **Book analysis** — themes, characters, author techniques - **Competitive analysis** — companies, products, market trends - **Course notes** — lectures, readings, key concepts - **Personal development** — frameworks, habits, book summaries - **Technical documentation** — APIs, architectures, design patterns - **Hobby deep-dives** — any subject you want to master ## License MIT --- title: "LM Studio KB — Master Index" type: index updated: 2026-06-09 lmstudio_version: "0.4.x (llmster era)" --- # LM Studio KB — Master Index **Domain:** LM Studio — desktop app + headless daemon for running LLMs locally (GGUF via llama.cpp, MLX on Apple Silicon), with an OpenAI-compatible local server, `lms` CLI, and SDKs. **Corpus:** 205 provenance-stamped sources in `raw/` (llms.txt-curated docs, docs-site crawl, repo READMEs, 58 solved bug-tracker issues). **Pages:** 20 (11 concepts · 3 entities · 2 summaries · 4 syntheses) — core guides + the operator/developer ring ## Concepts (how-to / feature areas) - [[concepts/install-and-setup]] — install, system requirements (Mac/Win/Linux), offline operation - [[concepts/models-download-and-import]] — Discover tab, quantization choices (Q4+ guidance), models directory, `lms import`, deleting models - [[concepts/local-api-server]] — Developer tab / `lms server start`, port 1234, OpenAI-compatible endpoints, native REST `/api/v0`, LAN serving - [[concepts/headless-and-service-mode]] — llmster daemon, headless app mode, JIT loading, Idle TTL (60 min default), Auto-Evict - [[concepts/lms-cli]] — the full command set, verbatim - [[concepts/structured-output]] — JSON-schema-enforced responses via `/v1/chat/completions` - [[concepts/tool-use-and-mcp]] — function calling, `.act()` agents, MCP host (`mcp.json`, ≥0.3.17), MCP security warning - [[concepts/chat-and-documents]] — chat UI, document RAG (.docx/.pdf/.txt), presets, per-model defaults - [[concepts/performance-and-serving]] — speculative decoding, parallel requests / continuous batching (llama.cpp v2.0.0+) - [[concepts/presets-and-model-defaults]] — presets (0.3.15 import from file/URL), per-model load defaults (applies to `lms load` too) - [[concepts/sdk-deep-dive]] — `.act()` execution rounds, schema-guaranteed `.respond()`, spec-decode API (SDK ≥1.2.0) ## Summaries - [[summaries/version-history]] — which version shipped what (0.2.22 → 0.4.1, incl. Anthropic-compat /v1/messages) - [[summaries/docs-catalog]] — map of all 205 sources / official doc areas ## Entities - [[entities/llmster]] — the headless daemon (install one-liners, `lms daemon up`) - [[entities/lmstudio-python]] — Python SDK (`pip install lmstudio`) - [[entities/lmstudio-js]] — TypeScript SDK (`npm install @lmstudio/sdk`) ## Syntheses (decisions & cross-source) - [[syntheses/deployment-modes-compared]] — GUI vs headless app vs llmster: pick by environment - [[syntheses/api-surfaces-compared]] — OpenAI-compat vs Anthropic-compat vs native REST vs SDKs - [[syntheses/model-runtime-casebook]] — reported patterns: Gemma-3 MLX pad/unused32, low-bit quant failures, Vulkan VRAM regression, think-block API behavior (low-confidence by design) - [[syntheses/troubleshooting-playbook]] — solved problems from the bug tracker, symptom → cause → fix ## Statistics - **Total pages**: 20 - **Concepts**: 11 · **Entities**: 3 · **Summaries**: 2 · **Syntheses**: 4 - **Sources ingested**: 205 (raw/, immutable) - **Confidence**: 17 high · 2 medium · 1 low ## Coverage notes Strong: setup, serving, CLI, headless ops, APIs, tools/MCP, common failures. Thinner (sources exist in `raw/`, pages not yet written): speculative decoding, prompt templates/tokenization, LM Link, model.yaml authoring/publishing, per-endpoint API reference. Recency boundary: sources fetched 2026-06-09. --- title: "Chat & Documents (RAG)" type: concept tags: [chat, rag, documents, presets] updated: 2026-06-09 confidence: medium sources: [raw/docs_page-chat-with-a-model-lm-studio.md, raw/docs_page-chat-with-documents-lm-studio.md, raw/docs_page-manage-chats-lm-studio.md, raw/docs_page-config-presets-lm-studio.md, raw/docs_page-per-model-defaults-lm-studio.md] --- # Chat & Documents (RAG) ## Chatting Load a model in the **Chat tab** and converse. Conversations are organized as manageable threads (Manage Chats). Model behavior is controlled by load parameters, prompt template, and **Config Presets** — reusable named configurations; **per-model defaults** let each model carry its own settings. ## Chat with documents Attach `.docx`, `.pdf`, or `.txt` files to a chat session for added context. Terminology the app uses: - **Retrieval** — identifying the relevant portion of a long document - **Query** — the input to retrieval - **RAG** — Retrieval-Augmented Generation - **Context** — the LLM's working memory, with a maximum size measured in tokens (1 token ≈ ¾ of a word) If the document fits in context, LM Studio can include it fully; longer documents go through retrieval. All document processing is **local** — files never leave the machine ([[concepts/install-and-setup]] § Offline). ## Related [[concepts/models-download-and-import]] · [[concepts/local-api-server]] --- title: "Headless & Service Mode (llmster, JIT, TTL)" type: concept tags: [headless, llmster, service, jit, ttl] updated: 2026-06-09 confidence: high sources: [raw/docs_page-run-lm-studio-as-a-service-headless-lm-studio.md, raw/docs_page-idle-ttl-and-auto-evict-lm-studio.md, raw/docs_page-lm-studio-llmster-and-lms-lm-studio.md, raw/docs_page-setup-llmster-as-a-startup-task-on-linux-lm-studio.md] --- # Headless & Service Mode Two ways to run LM Studio without the GUI: ## Option 1 — llmster (recommended) **llmster** is the core of LM Studio packaged as a standalone, server-native daemon (introduced with LM Studio 0.4.0). No GUI required at all — for Linux servers, cloud instances, GPU rigs, CI/CD. ```bash # install — Linux / Mac curl -fsSL https://lmstudio.ai/install.sh | bash # install — Windows irm https://lmstudio.ai/install.ps1 | iex # start the daemon lms daemon up ``` For boot-time startup on Linux, set llmster up as a startup task (systemd; see the Linux Startup Task doc). ## Option 2 — desktop app in headless mode On machines that *have* a GUI: App Settings (`⌘/Ctrl + ,`) → check **run the LLM server on login**. Exiting the app then minimizes it to the system tray with the server still running. The last server state is restored on launch; `lms server start` achieves it programmatically. ## Just-In-Time (JIT) model loading — default: enabled With JIT on, API calls to a model that isn't loaded **load it on demand** — clients don't need a pre-loaded model. Applies to both options. ## Idle TTL and Auto-Evict JIT's flip side is models lingering in memory. Two controls: - **Idle TTL** — how long a model may stay loaded with no requests before auto-unload. **Default: 60 minutes**; settable per request via a `ttl` field in the payload. - **Auto-Evict** — unloads previously-JIT-loaded models before loading new ones, so client apps can switch models without manual unloads. **Default: enabled**; toggle in Developer tab → Server Settings. ## Related [[concepts/local-api-server]] · [[entities/llmster]] · [[syntheses/deployment-modes-compared]] · [[syntheses/troubleshooting-playbook]] (headless Linux pitfalls) --- title: "Install & Setup" type: concept tags: [setup, install, requirements] updated: 2026-06-09 confidence: high sources: [raw/docs_page-get-started-with-lm-studio-lm-studio.md, raw/docs_page-system-requirements-lm-studio.md, raw/docs_page-offline-operation-lm-studio.md, raw/github_issue-installing-lm-studio-on-a-linux-environment-without-desktop-.md] --- # Install & Setup LM Studio is a desktop app for running LLMs locally. Install → download a model → load it → chat. Installers for macOS, Windows, and Linux are at lmstudio.ai/download. ## System requirements | OS | Requirements | |---|---| | **macOS** | Apple Silicon only (M1/M2/M3/M4); macOS 14.0+; 16GB+ RAM recommended (8GB possible with small models + modest context). Intel Macs not supported. | | **Windows** | x64 **or** ARM (Snapdragon X Elite). x64 CPUs require **AVX2**. 16GB+ RAM recommended; 4GB+ dedicated VRAM recommended. | | **Linux** | x64 or ARM64 (aarch64), distributed as an **AppImage**; Ubuntu 20.04+ (versions >22 less tested). x64 builds ship with AVX2 by default. | ## First run 1. Verify your machine meets the requirements above. 2. Install the latest LM Studio. 3. **Discover tab** (`⌘/Ctrl + 2`) → download your first model (see [[concepts/models-download-and-import]]). 4. **Chat tab** → open the model loader, select the downloaded model (optionally adjust load parameters), and chat. "Loading" a model means allocating memory (RAM/VRAM) for the model's weights. ## Headless / server installs (no GUI) For Linux servers without a desktop, use **llmster** (the standalone daemon) instead of the AppImage — see [[concepts/headless-and-service-mode]]. The AppImage route on a GUI-less box requires extra steps (`chmod +x` the AppImage, `apt install libfuse2`) and was only community-supported. ## Offline operation Once models are downloaded, **no internet is required**: chatting, chatting with documents (RAG), and the local server all run fully offline. Chat input and dropped documents never leave the device. Internet is only needed to search/download models and check updates. ## Related [[concepts/models-download-and-import]] · [[concepts/local-api-server]] · [[syntheses/deployment-modes-compared]] --- title: "lms — the CLI" type: concept tags: [cli, lms, commands] updated: 2026-06-09 confidence: high sources: [raw/github_doc-readme-md.md, raw/docs_page-lm-studio-llmster-and-lms-lm-studio.md, raw/docs_page-lms-get-lm-studio.md, raw/docs_page-import-models-lm-studio.md] --- # lms — the CLI `lms` is LM Studio's command-line interface. It drives **both** the desktop app and the llmster daemon, and ships automatically with either (LM Studio ≥0.2.22). If the command isn't on PATH, bootstrap it: ```bash npx lmstudio install-cli ``` Then verify in a **new** terminal window with `lms`. ## Core commands (verbatim) ```bash lms status # status of LM Studio lms server start # start the local API server lms server stop # stop it lms ls # list downloaded models (--json for machine-readable) lms ps # list LOADED models (--json available) lms load # load a model lms load -y # load with max GPU acceleration, no confirmation lms unload # unload one model lms unload --all # unload everything lms get # download a model lms import # import an externally-downloaded GGUF (experimental) lms log stream # stream logs from LM Studio lms create # scaffold a new LM Studio SDK project lms daemon up # start the llmster daemon (headless) ``` `lms --help` lists all subcommands; `lms --help` for details. ## Notes - `lms` is built with `lmstudio.js` and lives in the [[entities/lmstudio-js]] monorepo (can't be built standalone). - No delete-model command — remove model files from the models directory manually ([[concepts/models-download-and-import]]). ## Related [[concepts/local-api-server]] · [[concepts/headless-and-service-mode]] --- title: "Local API Server" type: concept tags: [server, api, openai-compatible] updated: 2026-06-09 confidence: high sources: [raw/docs_page-lm-studio-as-a-local-llm-api-server-lm-studio.md, raw/docs_page-openai-compatibility-endpoints-lm-studio.md, raw/docs_page-rest-api-v0-lm-studio.md, raw/github_issue-how-to-start-a-serve-on-a-local-network-with-cli.md] --- # Local API Server LM Studio serves local models over HTTP — on `localhost` or your network — from the **Developer tab** ("Start server" toggle) or the CLI: ``` lms server start ``` **Default port: 1234.** Base URL: `http://localhost:1234/v1` (OpenAI-compatible). ## API surfaces (see [[syntheses/api-surfaces-compared]]) - **OpenAI-compatible endpoints** — drop-in for existing OpenAI clients. - **Anthropic-compatible endpoints** — for Anthropic-format clients. - **LM Studio REST API** (native, `/api/v0`, v1 recommended for new projects) — richer stats. - **SDKs** — [[entities/lmstudio-python]] and [[entities/lmstudio-js]]. ## OpenAI-compatible endpoints | Endpoint | Method | |---|---| | `/v1/models` | GET | | `/v1/responses` | POST | | `/v1/chat/completions` | POST | | `/v1/embeddings` | POST | | `/v1/completions` | POST | Reuse any OpenAI client by switching its base URL: ```python from openai import OpenAI client = OpenAI(base_url="http://localhost:1234/v1") ``` Set `model` to the model identifier shown in LM Studio. OpenAI **Codex** works against LM Studio because `/v1/responses` is implemented. ## Native REST API (enhanced stats) `/api/v0` (LM Studio ≥0.3.6; a **v1 REST API now exists and is recommended** for new projects): `GET /api/v0/models`, `GET /api/v0/models/{model}`, `POST /api/v0/chat/completions`, `POST /api/v0/completions`, `POST /api/v0/embeddings`. Adds tokens/sec, time-to-first-token (TTFT), loaded/unloaded state, max context, and quantization info. ## Serving on the local network Enable "Serve on Local Network" in the GUI server settings — after that, `lms server start` serves network-wide. Without the GUI, edit `~/.lmstudio/.internal/http-server-config.json` (fallback `~/.cache/lm-studio/.internal/http-server-config.json`) and set `"networkInterface": "0.0.0.0"` — noting the maintainers say config file formats are **not a stable interface**. ## Related [[concepts/headless-and-service-mode]] (JIT loading, TTL) · [[concepts/structured-output]] · [[concepts/tool-use-and-mcp]] --- title: "Models — Download, Quantization & Import" type: concept tags: [models, gguf, quantization, huggingface] updated: 2026-06-09 confidence: high sources: [raw/docs_page-download-an-llm-lm-studio.md, raw/docs_page-import-models-lm-studio.md, raw/github_issue-how-do-i-remove-a-model-using-the-cli.md] --- # Models — Download, Quantization & Import ## Downloading (Discover tab) LM Studio's built-in downloader pulls any supported model from **Hugging Face**. Search by keyword (`llama`, `gemma`), by `user/model` string, or paste a full Hugging Face URL. Jump to Discover from anywhere with `⌘ + 2` (Mac) / `Ctrl + 2` (Win/Linux). ## Choosing a quantization Download options like `Q3_K_S`, `Q_8` are the **same model at different fidelity**. `Q` = quantization: compressing the model, trading some quality for size. Official guidance: **choose a 4-bit option or higher** if your machine can run it. ## Models directory & layout Default location: `~/.lmstudio/models/`, preserving the Hugging Face structure: ``` ~/.lmstudio/models/ └── publisher/ └── model/ └── model-file.gguf ``` Change the directory from **My Models**. ## Importing models downloaded elsewhere - `lms import ` (experimental) — interactive import of a GGUF file. - Or place files manually in the directory structure above (publisher/model/file). ## Removing a model There is no dedicated CLI delete command — go to the models directory and **delete the model's files** like any file (per maintainer guidance). `lms ls` lists what's downloaded. ## Related [[concepts/install-and-setup]] · [[concepts/lms-cli]] · [[syntheses/troubleshooting-playbook]] (load failures, checksum errors) --- title: "Performance & Serving Features — Speculative Decoding, Parallel Requests" type: concept tags: [performance, serving, batching, speculative-decoding] updated: 2026-06-09 confidence: high sources: [raw/docs_page-speculative-decoding-lm-studio.md, raw/docs_page-speculative-decoding-lm-studio-2.md, raw/docs_page-parallel-requests-lm-studio.md] --- # Performance & Serving Features The throughput/latency levers for serving from LM Studio. ## Speculative decoding A **draft model** speeds up a large model's generation **without reducing response quality**. Available in the app's model config and programmatically (Python SDK requires **≥1.2.0** for the draft-model API). ## Parallel requests (continuous batching) Set **Max Concurrent Predictions** when loading a model to process requests **in parallel instead of queued** — the server dynamically combines requests into a single batch for higher throughput. Requirements (official): supported on the **llama.cpp engine** (MLX "coming soon"); GGUF runtime must be **llama.cpp v2.0.0+**. ## Related [[concepts/local-api-server]] · [[concepts/headless-and-service-mode]] (JIT/TTL interplay) --- title: "Config Presets & Per-Model Defaults" type: concept tags: [presets, configuration, defaults] updated: 2026-06-09 confidence: high sources: [raw/docs_page-config-presets-lm-studio.md, raw/docs_page-per-model-defaults-lm-studio.md, raw/docs_page-configuring-the-model-lm-studio.md] --- # Config Presets & Per-Model Defaults The two configuration surfaces that make model behavior reproducible. ## Presets Bundle a **system prompt + parameters** into a named, reusable configuration applied across chats. Since **0.3.15**: import Presets **from file or URL** (and share yours — see the publish-your-presets doc in raw/). ## Per-model defaults Set **default load settings per model** — applied whenever that model loads anywhere in the app, **including `lms load`** from the CLI. This is the right place for per-model context length, GPU offload, and template choices instead of re-configuring per session. ## Related [[concepts/chat-and-documents]] · [[concepts/lms-cli]] --- title: "SDK Deep Dive — .act() Agents, Structured Response, Spec-Decode API" type: concept tags: [sdk, agents, structured-output, python, typescript] updated: 2026-06-09 confidence: high sources: [raw/docs_page-the-act-call-lm-studio.md, raw/docs_page-structured-response-lm-studio.md, raw/docs_page-speculative-decoding-lm-studio-2.md, raw/docs_page-using-lmstudio-python-in-repl-lm-studio.md] --- # SDK Deep Dive Operational detail behind [[entities/lmstudio-python]] / [[entities/lmstudio-js]]. ## `.act()` — autonomous agents in execution rounds The SDKs model agentic tool use as **execution rounds**: *run a tool → feed its output to the LLM → the LLM decides the next step* — repeated until the task completes. `.act()` runs that loop automatically against your locally-defined tools ([[concepts/tool-use-and-mcp]]). ## Structured response — `.respond()` with a schema Pass a JSON schema (or a **Pydantic model** in Python / zod in TS) to `.respond()` — the output is **guaranteed to conform** to the schema. The SDK-level counterpart of the server's structured output ([[concepts/structured-output]]). ## Speculative decoding API Attach a draft model programmatically — Python SDK **≥1.2.0** ([[concepts/performance-and-serving]]). ## Interactive use lmstudio-python has a documented REPL workflow (convenience API creates the default `Client` implicitly). --- title: "Structured Output (JSON Schema)" type: concept tags: [structured-output, json-schema, api] updated: 2026-06-09 confidence: high sources: [raw/docs_page-structured-output-lm-studio.md, raw/docs_page-structured-response-lm-studio.md, raw/github_issue-structured-output-broken-in-0-3-5.md] --- # Structured Output (JSON Schema) Enforce a response format by providing a **JSON schema** to `/v1/chat/completions` — the LLM then responds in valid JSON conforming to the schema. Works through LM Studio's server via **any OpenAI client** (it follows the same format as OpenAI's Structured Output API). ## Setup ```bash lms server start # or Developer tab → Start server ``` Then send a `response_format` with your schema (OpenAI format) to `http://localhost:1234/v1/chat/completions` — via curl or the OpenAI SDKs. ## In the SDKs Both SDKs accept schema objects natively: **pydantic** models in [[entities/lmstudio-python]], **zod** schemas in [[entities/lmstudio-js]] (see each SDK's structured-response docs in raw/). ## Caveats - Schema adherence depends on the model — small/heavily-quantized models may struggle with complex schemas. - Regression history exists (structured output broke in 0.3.5 and was fixed) — if schemas stop being honored after an update, check the release notes/bug tracker first. ## Related [[concepts/local-api-server]] · [[concepts/tool-use-and-mcp]] --- title: "Tool Use & MCP" type: concept tags: [tools, function-calling, mcp] updated: 2026-06-09 confidence: high sources: [raw/docs_page-tool-use-lm-studio.md, raw/docs_page-tool-definition-lm-studio.md, raw/docs_page-use-mcp-servers-lm-studio.md, raw/docs_page-using-mcp-via-api-lm-studio.md, raw/docs_page-the-act-call-lm-studio.md] --- # Tool Use & MCP ## Tool use (function calling) via the API LLMs can request calls to external functions through `/v1/chat/completions` and `/v1/responses` (OpenAI tool format — works with any OpenAI client). Start the server (`lms server start`), define tools in the request, and handle the model's tool-call responses in your code. The SDKs go further: lmstudio-python/js's **`.act()` call** runs an agentic loop that executes your tools for multiple rounds until the task completes. ## MCP servers in the app (≥0.3.17) LM Studio is an **MCP Host**: connect local *and* remote MCP servers (≥0.3.17 b10) and their tools become available to your local models. - **Install/edit:** Program tab (right sidebar) → `Install > Edit mcp.json` — LM Studio follows **Cursor's `mcp.json` notation**. Remote example: ```json { "mcpServers": { "hf-mcp-server": { "url": "https://huggingface.co/mcp" } } } ``` - Or use the **"Add to LM Studio" button** where sites provide it. **Security (official warning):** some MCP servers can run arbitrary code, access local files, and use your network. **Never install MCPs from untrusted sources.** ## MCP via the API MCP tools can also be exercised through the server API (see `raw/docs_page-using-mcp-via-api-lm-studio.md`) — so programmatic clients get the same tool ecosystem the app does. ## Related [[concepts/structured-output]] · [[concepts/local-api-server]] · [[entities/lmstudio-python]] · [[entities/lmstudio-js]] --- title: "llmster" type: entity tags: [llmster, daemon, headless] updated: 2026-06-09 confidence: high sources: [raw/docs_page-lm-studio-llmster-and-lms-lm-studio.md, raw/docs_page-run-lm-studio-as-a-service-headless-lm-studio.md, raw/docs_page-setup-llmster-as-a-startup-task-on-linux-lm-studio.md] --- # llmster LM Studio's **headless daemon** — the core of the desktop app packaged as a standalone background service with no GUI dependency (introduced with LM Studio 0.4.0). Full model-serving capability on Linux servers, cloud instances, GPU rigs, CI/CD, or as a boot service — without installing the desktop app at all. ## Install & run (verbatim) ```bash curl -fsSL https://lmstudio.ai/install.sh | bash # Linux / Mac irm https://lmstudio.ai/install.ps1 | iex # Windows lms daemon up # start the daemon ``` Boot-time startup on Linux: configure as a startup task (systemd; Linux Startup Task doc). ## Relationship to the other tools - **LM Studio (desktop app)** — full GUI: model discovery, chat, RAG, server, MCP. - **llmster** — the same serving core, GUI-less; recommended path for service/headless use. - **`lms` CLI** — drives *both*; ships with either. See [[syntheses/deployment-modes-compared]] for when to use which, and [[concepts/headless-and-service-mode]] for JIT/TTL behavior that applies to daemon serving. --- title: "lmstudio-js (TypeScript SDK)" type: entity tags: [sdk, typescript, javascript] updated: 2026-06-09 confidence: high sources: [raw/github_doc-readme-md-3.md, raw/docs_page-lmstudio-js-typescript-sdk-lm-studio.md, raw/github_doc-readme-md.md] --- # lmstudio-js (TypeScript SDK) LM Studio's official JavaScript/TypeScript client SDK. ```bash npm install @lmstudio/sdk --save ``` ```ts import { LMStudioClient } from "@lmstudio/sdk"; const client = new LMStudioClient(); ``` ## What it does (per the README) - Chat responses and text completions from local LLMs - **Define functions as tools and turn LLMs into autonomous agents** (`.act()`) running fully locally - Load, configure, and unload models from memory - Generate text embeddings - Works in **browser and any Node-compatible environment** ## Notes - The `lms` CLI is built with lmstudio-js and lives in the same monorepo (the CLI cannot be built standalone) — [[concepts/lms-cli]]. - Structured responses use **zod** schemas ([[concepts/structured-output]]). Repo: github.com/lmstudio-ai/lmstudio-js. Python counterpart: [[entities/lmstudio-python]]. --- title: "lmstudio-python (Python SDK)" type: entity tags: [sdk, python] updated: 2026-06-09 confidence: high sources: [raw/github_doc-readme-md-2.md, raw/docs_page-lmstudio-python-python-sdk-lm-studio.md, raw/docs_page-using-lmstudio-python-in-repl-lm-studio.md] --- # lmstudio-python (Python SDK) LM Studio's official Python SDK. ```console $ pip install lmstudio ``` ## Shape of the API - The base component is the synchronous **`Client`** — create once; it manages the underlying **websocket** connections to the LM Studio instance. - A **top-level convenience API** exists for interactive/REPL use (implicitly creates a default `Client`). ## What it does Chat/completions against local models, model load/unload and configuration, embeddings, structured responses (pydantic schemas), tool definition, and the agentic **`.act()`** loop ([[concepts/tool-use-and-mcp]]). Repo: github.com/lmstudio-ai/lmstudio-python. TypeScript counterpart: [[entities/lmstudio-js]]. --- title: "Activity Log" type: log --- # Activity Log Append-only record of all wiki changes. ## Format Each entry follows this format: ``` ### YYYY-MM-DD HH:MM — [Action Type] - **Source/Trigger**: what initiated the action - **Pages created**: list of new pages - **Pages updated**: list of updated pages - **Notes**: any contradictions flagged, decisions made ``` --- ### 2026-04-08 00:00 — Setup - **Source/Trigger**: Repository initialized - **Pages created**: index.md, log.md, dashboard.md, analytics.md, flashcards.md - **Pages updated**: none - **Notes**: Empty knowledge base ready for first source ingestion --- ## 2026-06-10 — removed Obsidian scaffolding from the served wiki Deleted `analytics.md`, `dashboard.md`, `flashcards.md` (Obsidian plugin pages — Dataview/Charts View/Spaced Repetition markup, unusable when served as plain Markdown to agents) and the `journal/` scaffold (template only). `CLAUDE.md` directory layout updated: production/planning material lives at repo root, never under `wiki/` (everything under `wiki/` is served publicly). --- title: "Docs Catalog — What the Official Documentation Covers" type: summary tags: [catalog, docs, map] updated: 2026-06-09 confidence: high sources: [raw/llms_txt-llms-txt-index.md, raw/docs_page-welcome-to-lm-studio-docs-lm-studio.md, raw/docs_page-lm-studio-developer-docs-lm-studio.md] --- # Docs Catalog — Map of the Official Documentation LM Studio publishes a maintainer-curated **llms.txt** index; this KB's `raw/` holds **205 sources** from it plus the docs crawl and trackers. The map of what exists (so "does a doc for X exist?" is answerable even where this wiki has no page yet): | Area | What's documented (in raw/) | |---|---| | **App basics** | install, system requirements, offline mode, chat, RAG/documents, presets, per-model defaults, color themes, localization | | **Model management** | download, import, directory layout, JIT/TTL/auto-evict, memory management | | **Server & APIs** | OpenAI-compat (incl. `/v1/responses`), Anthropic-compat, native REST v0/v1, authentication, streaming events, parallel requests, idle TTL | | **API operations** | per-endpoint pages: chat/text completions, embeddings, models list/info, load/unload, tokenization, cancel predictions, get context length/load config/download status | | **SDKs** | lmstudio-python + lmstudio-js: chats (stateful), structured response, tool definition, `.act()`, image input, embedding, REPL | | **CLI** | `lms` command set, `lms get` | | **Ops** | headless/llmster, Linux startup task, LM Link (remote access), proxy | | **Ecosystem** | integrations page, "Add to LM Studio" button, model.yaml authoring/publishing, Hub push/pull | Thinner in this wiki (sources exist; pages pending): model.yaml authoring, LM Link, Hub publishing, per-endpoint API reference detail. --- title: "Version History — Feature Timeline" type: summary tags: [versions, changelog, releases] updated: 2026-06-09 confidence: high sources: [raw/docs_page-api-changelog-lm-studio.md, raw/llms_txt_doc-0-3-5.md, raw/github_doc-readme-md.md, raw/docs_page-use-mcp-servers-lm-studio.md, raw/docs_page-config-presets-lm-studio.md, raw/docs_page-rest-api-v0-lm-studio.md] --- # Version History — Feature Timeline Which version introduced what (compiled from the official API changelog, blog posts, and per-feature docs). Useful for "why doesn't my install have X". | Version | Shipped | |---|---| | 0.2.22 | `lms` CLI ships with the app | | 0.3.5 | **headless mode + on-demand (JIT) model loading**; mlx-engine Pixtral support | | 0.3.6 | native REST API `/api/v0` | | 0.3.15 | Preset import from file/URL | | 0.3.17 (b10) | **MCP Host** — local + remote MCP servers | | 0.4.0 | **llmster daemon**; **native v1 REST API (`/api/v1/*`)**; MCP via API; stateful chats; authentication | | 0.4.1 | **Anthropic-compatible `POST /v1/messages`** — use Claude-Code-style clients against LM Studio | Pattern: API surfaces accrete fast (v0 → v1 REST, OpenAI-compat → Anthropic-compat) — when an endpoint 404s, check this table against your app version first ([[concepts/local-api-server]]). --- title: "API Surfaces Compared — Which Interface Should You Use?" type: synthesis tags: [api, openai-compatible, sdk, decision] updated: 2026-06-09 confidence: high sources: [raw/docs_page-lm-studio-as-a-local-llm-api-server-lm-studio.md, raw/docs_page-openai-compatibility-endpoints-lm-studio.md, raw/docs_page-rest-api-v0-lm-studio.md, raw/docs_page-anthropic-compatibility-endpoints-lm-studio.md, raw/github_doc-readme-md-2.md, raw/github_doc-readme-md-3.md] --- # API Surfaces Compared One server (`lms server start`, port **1234**), five ways in. **Decision rule: existing OpenAI/Anthropic code → compatibility endpoints; new app code → SDK; ops/monitoring → native REST.** | Surface | Base | Use when | |---|---|---| | **OpenAI-compatible** | `http://localhost:1234/v1` | You have existing OpenAI-client code/tools — change only the base URL. Covers `models`, `responses`, `chat/completions`, `embeddings`, `completions`. Codex works (via `/v1/responses`). | | **Anthropic-compatible** | (Anthropic-format endpoints) | Your client speaks Anthropic's API format. | | **Native REST** | `/api/v0` (v1 recommended for new projects) | You want what compat APIs lack: tokens/sec, TTFT, loaded/unloaded state, max context, quantization metadata. | | **[[entities/lmstudio-python]]** | websocket via `Client` | Python apps: model management + prediction + `.act()` agents in-process. | | **[[entities/lmstudio-js]]** | `@lmstudio/sdk` | TS/JS (Node *or browser*): same, plus the CLI is built on it. | ## Rules of thumb 1. **Migrating an existing integration** → compatibility endpoint, one-line base-URL change. 2. **Building new** → SDK: model load/unload control, structured responses (pydantic/zod), tools/`.act()` — things raw HTTP makes you hand-roll. 3. **Observability/ops** → native REST stats endpoints. 4. All surfaces serve the same models and honor JIT/TTL/Auto-Evict ([[concepts/headless-and-service-mode]]). ## Related [[concepts/local-api-server]] · [[concepts/structured-output]] · [[concepts/tool-use-and-mcp]] --- title: "Deployment Modes Compared — GUI vs Headless App vs llmster" type: synthesis tags: [deployment, headless, llmster, decision] updated: 2026-06-09 confidence: high sources: [raw/docs_page-lm-studio-llmster-and-lms-lm-studio.md, raw/docs_page-run-lm-studio-as-a-service-headless-lm-studio.md, raw/github_issue-installing-lm-studio-on-a-linux-environment-without-desktop-.md, raw/github_issue-macos-server-not-working-without-a-user-session-windowserver.md] --- # Deployment Modes Compared Three ways to run LM Studio. **Decision rule: GUI for humans, llmster for servers, headless-app mode only when the GUI machine doubles as the server.** | | Desktop app (GUI) | Desktop app, headless mode | llmster daemon | |---|---|---|---| | Needs a display/GUI | yes | yes (app installed, UI hidden) | **no** | | Best for | interactive use: discover models, chat, RAG, MCP | "my workstation is also the LLM server" | Linux servers, cloud, GPU rigs, CI/CD, boot services | | Start | app | App Settings → run server on login (tray) | `lms daemon up` | | Install | installer / AppImage | same | `curl -fsSL https://lmstudio.ai/install.sh \| bash` | ## Pick the desktop app if… You want the full experience: Discover-tab downloads, chat UI, document RAG, presets, MCP host. Easiest start. ## Pick headless app mode if… The machine *has* a GUI and you want the server surviving app exit: enable run-on-login; the app minimizes to tray with the server running. Server state restores on launch. ## Pick llmster if… No display exists or you want a clean service: it's the app's serving core, server-native. Don't fight the AppImage on GUI-less Linux (community workaround needed `libfuse2` + manual steps) — llmster is the supported path. On macOS, running the *app's* server without a user session hits WindowServer problems (bug tracker) — another reason the daemon exists. ## Applies to all server modes JIT on-demand loading, Idle TTL (default 60 min), Auto-Evict — [[concepts/headless-and-service-mode]]. The `lms` CLI drives all three — [[concepts/lms-cli]]. --- title: "Model & Runtime Casebook — Reported Patterns from the Bug Tracker" type: synthesis tags: [troubleshooting, models, mlx, vulkan, compatibility] updated: 2026-06-09 confidence: low sources: [raw/github_issue-gemma-3-with-mlx-only-generates-pad-pad-pad.md, raw/github_issue-gemma3-only-responding-with-unused32.md, raw/github_issue-failed-run-qwq-32b-3bit-and-4bit-model.md, raw/github_issue-regression-vulkan-v1-52-0-does-not-use-gpu-vram-on-gfx1151-r.md, raw/github_issue-strange-characters-in-output-after-update-0-3-6-in-lm-studio.md, raw/github_issue-think-block-missing-in-api-response-lm-studio-0-3-22-qwen3-t.md, raw/github_issue-qwen3vl-4b-hallucinates-when-running-on-lmstudio.md, raw/github_issue-just-in-time-loading-not-working.md, raw/github_issue-lm-studio-couldn-t-detect-gpu.md] --- # Model & Runtime Casebook — Reported Patterns **Confidence: these are *reported patterns* from the tracker, not verified fixes** — value is knowing you're not alone and which thread to check. (Solved cases with clear fixes live in [[syntheses/troubleshooting-playbook]].) ## Model-specific output corruption (reported) - **Gemma 3 on MLX emitting only `` tokens**, and **Gemma 3 replying ``** (intermittent; reported on 0.3.13/llama.cpp v1.20.1) — model+runtime version combos matter; check the thread for the version where it cleared. - **QwQ-32B failing at 3-bit/4-bit quants** — low-bit quant failures are model-specific; try a higher-bit quant. - **Qwen3-VL hallucinating on low-res/blurry input images** — maintainers' first ask was image quality; feed higher-resolution images before suspecting the model. - **Strange characters in output after the 0.3.6 update** — update-correlated; maintainers triage via `Ctrl+Shift+R` (runtime manager) screenshots: runtime/app version mismatch territory. ## API behavior (reported) - **`think` block missing from API responses** (Qwen3 thinking models, 0.3.22) — reasoning content separation has version-specific behavior; see the API changelog's `reasoningContent` change ([[summaries/version-history]]). ## Runtime/GPU (reported) - **Vulkan v1.52.0 regression: GPU VRAM unused on gfx1151 (Ryzen AI)** — runtime-version-specific; pin/downgrade the runtime in the runtime manager. - **GPU not detected at all** — see the dedicated issue; first checks are driver + runtime selection (`Ctrl+Shift+R`). - **JIT loading "not working"** — needs **beta 4+** of the relevant release (per the thread's resolution); JIT is also tied to server settings ([[concepts/headless-and-service-mode]]). --- title: "Troubleshooting Playbook — Solved Problems from the Bug Tracker" type: synthesis tags: [troubleshooting, errors, gpu, loading] updated: 2026-06-09 confidence: medium sources: [raw/github_issue-no-lm-runtime-found-for-model-format-gguf.md, raw/github_issue-linux-exit-code-133-error-when-loading-large-llm-models.md, raw/github_issue-how-to-start-a-serve-on-a-local-network-with-cli.md, raw/github_issue-installing-lm-studio-on-a-linux-environment-without-desktop-.md, raw/github_issue-how-do-i-remove-a-model-using-the-cli.md, raw/github_issue-fixed-in-0-3-30b2-search-crash-oops-sorry-an-unexpected-erro.md] --- # Troubleshooting Playbook Symptom → cause → fix, distilled from *solved* issues in the official bug tracker. (Each entry cites its issue in `sources`; ~58 solved issues live in `raw/github_issue-*` for deeper digging.) ## "No LM Runtime found for model format 'gguf'!" **Cause:** the LM Runtime (llama.cpp engine) isn't installed/loaded — seen on fresh Linux installs (reported on 0.3.6/Arch). **Fix:** open the runtimes manager — `Ctrl + Shift + R` (Win/Linux) / `⌘ + Shift + R` (Mac) — and install/update the GGUF (llama.cpp) runtime; re-download a runtime rather than the model. ## Large model fails to load on Linux with "Exit code: 133" (RAM/VRAM are sufficient) **Cause (investigated in the issue):** Electron's PartitionAlloc rejects the huge `posix_memalign` allocation — same model loads fine in plain llama.cpp; upstream Electron issue. **Fix/workaround:** track the upstream fix; reduce context size / use a smaller quant to shrink the allocation, or serve via a non-Electron path. ## Server only reachable on localhost; need LAN access from the CLI **Fix:** enable "Serve on Local Network" once in the GUI, after which `lms server start` serves the network. GUI-less: set `"networkInterface": "0.0.0.0"` in `~/.lmstudio/.internal/http-server-config.json` (fallback: `~/.cache/lm-studio/.internal/http-server-config.json`). Maintainers warn config-file formats aren't a stable interface. ## Installing on GUI-less Linux **Fix:** prefer **llmster** (`curl -fsSL https://lmstudio.ai/install.sh | bash`, then `lms daemon up`). The community AppImage route needs `chmod +x` + `apt install libfuse2` and more steps — [[syntheses/deployment-modes-compared]]. ## "How do I delete a model from the CLI?" **Answer (maintainer):** there's no CLI delete — remove the model's files from the models directory (`~/.lmstudio/models/publisher/model/`) like any file. ## Search crashes / "Oops! an unexpected error has occurred" **Pattern:** version-specific regressions (this one fixed in 0.3.30b2). **Rule:** on weird breakage right after an update, check the bug tracker for your exact version before debugging locally. ## General diagnostics ```bash lms status # is LM Studio/llmster up? lms ps # what's actually loaded? lms log stream # live logs while reproducing the problem ```