wikis / llama.cpp / wiki / summaries / community-benches-catalog.md view as markdown
Community Benchmarks & Quant Guides โ Catalog
Map of the 15 provenance-stamped community sources in raw/community/ (each carries its source URL + author). Use this to answer "is there data on X?" โ details live in the cited files and the synthesis pages.
Hardware benchmarks
- Apple Silicon performance discussion (llama.cpp GitHub discussions)
- NVIDIA CUDA benchmarks
- DGX Spark KV-cache quantization measurements
Quantization guides & evaluations
- bartowski's quant guide (HF model-card conventions โ the de-facto community quant naming)
- Kaitchup GGUF guide ยท SteelPhoenix guide ยท HF GGUF usage docs
- Unsloth dynamic GGUFs (dynamic quantization approach)
- mradermacher imatrix practices
- artefact2 quant comparison table ยท arXiv quant evaluation (academic eval) ยท PR #1684 (the original k-quants design)
KV-cache quantization
- smcleod KV-quant guide + the DGX Spark measurements above
Engine comparisons
- vLLM vs llama.cpp: GitHub issue #15180 thread + Red Hat's comparison (llamacpp vs vllm synthesizes these)
Related: quant types compared ยท cli and tools reference (llama-bench).
