wikis / LM Studio / wiki / syntheses / model-runtime-casebook.md view as markdown

Model & Runtime Casebook — Reported Patterns

type: synthesisconfidence: lowupdated: 2026-06-09sources: 9

Confidence: these are reported patterns from the tracker, not verified fixes — value is knowing you're not alone and which thread to check. (Solved cases with clear fixes live in troubleshooting playbook.)

Model-specific output corruption (reported)

Gemma 3 on MLX emitting only <pad> tokens, and Gemma 3 replying <unused32> (intermittent; reported on 0.3.13/llama.cpp v1.20.1) — model+runtime version combos matter; check the thread for the version where it cleared.
QwQ-32B failing at 3-bit/4-bit quants — low-bit quant failures are model-specific; try a higher-bit quant.
Qwen3-VL hallucinating on low-res/blurry input images — maintainers' first ask was image quality; feed higher-resolution images before suspecting the model.
Strange characters in output after the 0.3.6 update — update-correlated; maintainers triage via Ctrl+Shift+R (runtime manager) screenshots: runtime/app version mismatch territory.

API behavior (reported)

think block missing from API responses (Qwen3 thinking models, 0.3.22) — reasoning content separation has version-specific behavior; see the API changelog's reasoningContent change (version history).

Runtime/GPU (reported)

Vulkan v1.52.0 regression: GPU VRAM unused on gfx1151 (Ryzen AI) — runtime-version-specific; pin/downgrade the runtime in the runtime manager.
GPU not detected at all — see the dedicated issue; first checks are driver + runtime selection (Ctrl+Shift+R).
JIT loading "not working" — needs beta 4+ of the relevant release (per the thread's resolution); JIT is also tied to server settings (headless and service mode).