wikis / LM Studio / wiki / syntheses / model-runtime-casebook.md view as markdown
Model & Runtime Casebook โ Reported Patterns
Confidence: these are reported patterns from the tracker, not verified fixes โ value is knowing you're not alone and which thread to check. (Solved cases with clear fixes live in troubleshooting playbook.)
Model-specific output corruption (reported)
- Gemma 3 on MLX emitting only
<pad>tokens, and Gemma 3 replying<unused32>(intermittent; reported on 0.3.13/llama.cpp v1.20.1) โ model+runtime version combos matter; check the thread for the version where it cleared. - QwQ-32B failing at 3-bit/4-bit quants โ low-bit quant failures are model-specific; try a higher-bit quant.
- Qwen3-VL hallucinating on low-res/blurry input images โ maintainers' first ask was image quality; feed higher-resolution images before suspecting the model.
- Strange characters in output after the 0.3.6 update โ update-correlated; maintainers triage via
Ctrl+Shift+R(runtime manager) screenshots: runtime/app version mismatch territory.
API behavior (reported)
thinkblock missing from API responses (Qwen3 thinking models, 0.3.22) โ reasoning content separation has version-specific behavior; see the API changelog'sreasoningContentchange (version history).
Runtime/GPU (reported)
- Vulkan v1.52.0 regression: GPU VRAM unused on gfx1151 (Ryzen AI) โ runtime-version-specific; pin/downgrade the runtime in the runtime manager.
- GPU not detected at all โ see the dedicated issue; first checks are driver + runtime selection (
Ctrl+Shift+R). - JIT loading "not working" โ needs beta 4+ of the relevant release (per the thread's resolution); JIT is also tied to server settings (headless and service mode).
