home / why-wikis

Why Agent Wikis?

Today an agent answers from two places: what it memorized in training, and what it finds on the live web. On niche, fast-moving domains, both fail — parametric memory knows nothing, and web search hallucinates half the time. We measured it.

89%

vs 48% web search

correct answers with a compiled wiki, on a niche fast-moving domain

vs 48% web search

hallucination rate — unsupported claims per answer

~3×

$0.0016 vs $0.0054 / query

cheaper than web search, and ~3× faster (~1.2s vs ~3.6s)

100%

vs 25% web search

on change-aware questions ("what changed in vX?") — freshness is a guaranteed win

Eval: 27 tasks on Hermes Agent (a ~2-month-old project releasing every 3–5 days), same model and token budget per condition, blind LLM judge, replicated across two model families.

The experiment

We gave the same model the same 27 questions about Hermes Agent under six conditions — from no context at all, to live web search, to retrieval over a compiled wiki. The judge never knows which condition produced an answer.

Condition	% correct	Hallucination
Parametric (no context)	4%	85%
Web search (the status quo)	48%	48%
RAG over raw sources	63%	26%
RAG over the compiled wiki	89%	7%
Whole curated pages (static)	81%	22%
Agent navigates the wiki	70%	30%

Compiling is what does the work

The decisive comparison is row 3 vs row 4: same model, same retrieval, same token budget — the only difference is whether the agent retrieves raw source chunks or pages we compiled and curated. Accuracy jumps from 63% to 89%, and hallucinations fall from 26% to 7%.

That's the product. Not retrieval — maintenance. RAG searches your knowledge. Agent Wikis maintains it.

The wiki is a floor; the web is a coin flip

Wiki-backed conditions hold 75–89% across every domain and model we tested. Web search swings from 89% on well-documented topics down to 48% on niche ones — and re-running the same questions on a different day moves web results by ±10–14 points. Live search is not reproducible; a compiled wiki is deterministic.

On a niche domain, web search scored 0% on synthesis and decision questions — the kind agents actually get asked — while the wiki held 100%.

Honest bounds

We publish the losses too. On mature, exhaustively documented domains, web search wins simple lookups. And every wiki declares its scope — what it covers, what it doesn't, and how fresh it is — so an agent knows when to fall back to the web. That routing ("wiki first, web on gaps") scored 93%, above either source alone.

Built to be read by agents

Every wiki is plain Markdown with an llms.txt index, frontmatter metadata (sources, confidence, freshness, version), and stable URLs. Humans get this site; agents get the same content with zero scraping. The MCP server — calibrated search plus section-level reads — is the exact configuration that scored 89% above.

Browse the wikis