wikis / llama.cpp / wiki / summaries / community-benches-catalog.md view as markdown

Community Benchmarks & Quant Guides — Catalog

type: summaryconfidence: mediumupdated: 2026-06-09sources: 15

Map of the 15 provenance-stamped community sources in raw/community/ (each carries its source URL + author). Use this to answer "is there data on X?" — details live in the cited files and the synthesis pages.

Hardware benchmarks

Apple Silicon performance discussion (llama.cpp GitHub discussions)
NVIDIA CUDA benchmarks
DGX Spark KV-cache quantization measurements

Quantization guides & evaluations

bartowski's quant guide (HF model-card conventions — the de-facto community quant naming)
Kaitchup GGUF guide · SteelPhoenix guide · HF GGUF usage docs
Unsloth dynamic GGUFs (dynamic quantization approach)
mradermacher imatrix practices
artefact2 quant comparison table · arXiv quant evaluation (academic eval) · PR #1684 (the original k-quants design)

KV-cache quantization

smcleod KV-quant guide + the DGX Spark measurements above

Engine comparisons

vLLM vs llama.cpp: GitHub issue #15180 thread + Red Hat's comparison (llamacpp vs vllm synthesizes these)

Related: quant types compared · cli and tools reference (llama-bench).