# vLLM

> Wiki for vLLM: installation, the OpenAI-compatible server, configuration, quantization, distributed serving, and per-model realities.
> Covers: vLLM installation (CUDA/ROCm/CPU), offline LLM API and the OpenAI-compatible server, engine args and env vars, quantization methods, parallelism and scaling, CLI, metrics/ops, releases v0.19-v0.22, and tracker-sourced model notes (gpt-oss, Llama, Qwen).
> Not covered: Kubernetes/Docker deployment guides, engine internals/design docs, contributing, benchmarking deep-dives, and releases after the date below - use web search.
> Current as of: 2026-06-09 (v0.22.1)

- [LLM Wiki](/raw/vllm/README.md)
- [vLLM KB — Master Index](/raw/vllm/wiki/index.md)
- [CLI Reference — vllm {serve,chat,complete,bench,run-batch}](/raw/vllm/wiki/concepts/cli-reference.md)
- [Configuration — Engine Args, Env Vars, Memory](/raw/vllm/wiki/concepts/configuration.md)
- [Installation (GPU / CPU / Platforms)](/raw/vllm/wiki/concepts/install.md)
- [Integrations — Claude Code, Codex, LangChain, LlamaIndex](/raw/vllm/wiki/concepts/integrations-and-clients.md)
- [Models & Support (incl. Transformers Backend)](/raw/vllm/wiki/concepts/models-and-support.md)
- [Multimodal Inputs, LoRA & Prompt Embeddings](/raw/vllm/wiki/concepts/multimodal-and-lora.md)
- [Observability & Ops — Metrics, Reproducibility, Usage Stats](/raw/vllm/wiki/concepts/observability-and-ops.md)
- [OpenAI-Compatible Server](/raw/vllm/wiki/concepts/openai-compatible-server.md)
- [Parallelism & Scaling (TP / PP / DP / EP / CP)](/raw/vllm/wiki/concepts/parallelism-and-scaling.md)
- [Pooling Models — Embeddings, Classify, Score, Reward](/raw/vllm/wiki/concepts/pooling-models.md)
- [Quantization — Methods & When to Use Which](/raw/vllm/wiki/concepts/quantization.md)
- [Quickstart — Offline Inference & Online Serving](/raw/vllm/wiki/concepts/quickstart-and-serving.md)
- [Activity Log](/raw/vllm/wiki/log.md)
- [Release Digest — v0.19.0 → v0.22.1](/raw/vllm/wiki/summaries/release-digest.md)
- [Model Notes from the Tracker — gpt-oss, Llama, Qwen, Gemma & Friends](/raw/vllm/wiki/syntheses/model-notes-from-the-tracker.md)
- [Serving Decisions — Mode, Memory, Scale](/raw/vllm/wiki/syntheses/serving-decisions.md)
- [Troubleshooting Playbook](/raw/vllm/wiki/syntheses/troubleshooting-playbook.md)