wikis / llama.cpp / README.md view as markdown

llama.cpp Knowledge Base

updated: 2026-06-10

Coversllama.cpp build and setup, inference, GGUF quantization, the server, grammars and function-calling, and troubleshooting.

Not coveredMaster changes after the date below and hardware-specific benchmarks — use web search.

Current as of2026-05-30 (master (~2026-05))

🤖 Agent access: /wiki/llama-cpp/llms.txt /wiki/llama-cpp/llms-full.txt /wiki/llama-cpp/index.json

LLM-maintained research KB on llama.cpp — the C/C++ engine for running LLMs locally. Used as the research backbone for YouTube videos (tutorials, benchmarks, deep dives).

Structure

raw/ — immutable source documents (doc mirrors, transcripts, discussion/PR dumps, benchmark logs)
wiki/ — synthesized knowledge (summaries, concepts, entities, syntheses)

Schema and maintenance rules: see CLAUDE.md.

Usage

Add new sources: drop them into raw/ and ask the LLM to "ingest" them
Ask questions: the LLM reads the wiki to synthesize answers with links
Draft video modules: ask "draft module on " to produce slide + script outlines sourced from the wiki

Version tracking

llama.cpp ships rolling builds (b####) rather than semver releases. Each wiki page records the latest build tag it was verified against in its llama_build frontmatter field.

Latest verified llama.cpp build: (none yet — scaffold)

Based on the llm-wiki template / Karpathy's "LLM Wiki" pattern.