wikis / Hermes / wiki / concepts / auxiliary-models.md view as markdown

Auxiliary Models — Dedicated Providers for 8 Built-in Side Tasks

type: conceptconfidence: highupdated: 2026-06-01hermes_version: v0.15.0sources: 6

Roster history (read first): The set of auxiliary tasks has changed across releases. As of v0.15.0 the eight built-in slots are vision, compression, web_extract, approval, mcp, title_generation, skills_hub, curator. Earlier KB notes listed flush_memories (replaced by curator in v0.12) and session_search (removed in v0.15 — it is now a free, no-LLM FTS5 tool, not an aux-model task). Plugins can register additional custom aux tasks via register_auxiliary_task() (v0.15), so the set is no longer fixed at exactly eight.

Definition

Hermes routes a set of internal "side tasks" through a separate, user-configurable model path — the auxiliary model system. Instead of burning your main agent's (expensive) model on every image analysis, web page summary, or context compression, you can point each task slot at its own provider and model — typically a cheap/fast one. By default Hermes auto-detects the first available provider from a short chain and uses Gemini Flash for everything. There are eight built-in slots (below) plus any custom slots a plugin registers.

How It Works

When an auxiliary task fires (e.g. the agent opens an image, the context compressor kicks in, a skill match is scored), the main agent's runtime doesn't handle it. Control passes to agent/auxiliary_client.py, which looks at the auxiliary: block in ~/.hermes/config.yaml (or env var overrides) to resolve:

Which provider to use (provider: auto walks a chain; a named provider is used directly; base_url short-circuits both)
Which model to request (provider's default if unspecified; Gemini Flash in the auto path)
Which auth to use (api_key, or OPENAI_API_KEY as a fallback for custom base_url)
How long to wait (the timeout field)

The resulting aux request runs on its own HTTP client, independent of the main conversational model — so it cannot "steal" context from the main chat, nor does its usage count against the main model's rate limits.

The `auto` resolution chain

Per the docs: "Uses the first available provider (OpenRouter → Nous → Codex) with Gemini Flash." In practice the chain prefers aggregator providers first (cheap, many models available), then Nous Portal, then Codex OAuth. If the user has none of these configured and no base_url override, Hermes will fail the aux call.

Independence from main-agent routing

Per docs-developer-guide-provider-runtime.md: "Auxiliary tasks use their own independent provider auto-detection chain." They share the same underlying transport infrastructure (resolve_provider_client()) but maintain separate credential bundles, timeouts, and retry behavior.

Key Parameters

The 8 built-in auxiliary task slots (v0.15)

Slot	What it does	Default timeout
`vision`	Image analysis (`vision_analyze` tool + browser screenshots)	30s (+30s image download)
`web_extract`	Web page summarization + browser text extraction	360s
`approval`	Dangerous command classifier (powers `approvals.mode: smart`)	30s
`compression`	Summarizes middle turns of long conversations. Model/provider set HERE (`auxiliary.compression.provider/model`); the top-level `compression:` block holds thresholds only	120s
`title_generation`	Auto-names sessions (the v0.15 slot that replaced `session_search`)	30s
`skills_hub`	Skill matching and search	30s
`mcp`	MCP tool dispatch helper	30s
`curator`	Autonomous skill/memory maintenance (v0.12 swap-in for the old `flush_memories`)	30s

No longer aux: session_search was a slot through v0.14 but was rebuilt in v0.15 into a single three-mode (discovery/scroll/browse) FTS5 tool with no LLM, no cost, ~20ms — so it no longer consumes an auxiliary model. Don't bill it as an aux cost item.

Plugin-registered custom tasks (v0.15)

Plugins can add their own auxiliary tasks via PluginContext.register_auxiliary_task(key=..., display_name=..., description=..., defaults={...}). The eight built-in keys above are reserved (registering one raises ValueError). A custom task's defaults accept provider (default "auto"), model (default ""), timeout (default 60), and extra_body. This means the auxiliary set is extensible — "8" is the built-in floor, not a hard ceiling.

Layered fallback (v0.15)

Auxiliary calls now degrade through a ladder on capacity errors (402 / 429 / connection): primary → resolution chain → main agent → graceful fail. This is resilience, not routing — there is still no per-tool/per-complexity rule engine.

The 5 config knobs per slot

auxiliary:
  vision:
    provider: "auto"       # "auto" | "openrouter" | "nous" | "codex" | "copilot" | "anthropic" | "main" | "zai" | "kimi-coding" | "minimax" | <custom provider>
    model: ""              # e.g. "openai/gpt-4o", "google/gemini-2.5-flash" — empty = provider default
    base_url: ""           # Custom OpenAI-compatible endpoint; overrides provider when set
    api_key: ""            # Key for base_url (falls back to OPENAI_API_KEY)
    timeout: 30            # Seconds; raise for slow local models
    download_timeout: 30   # vision only — HTTP image download

The `"main"` special value

provider: "main" means "use whatever provider my main agent uses." It is valid only inside auxiliary:, compression:, and fallback_model: configs. Not valid at the top-level model.provider. Useful when you've set up a custom endpoint as your main model and want auxiliary tasks to share the same endpoint.

Env var overrides (legacy, vision + web_extract only)

AUXILIARY_VISION_PROVIDER
AUXILIARY_VISION_MODEL
AUXILIARY_VISION_BASE_URL
AUXILIARY_VISION_API_KEY
AUXILIARY_WEB_EXTRACT_PROVIDER
AUXILIARY_WEB_EXTRACT_MODEL
AUXILIARY_WEB_EXTRACT_BASE_URL
AUXILIARY_WEB_EXTRACT_API_KEY

The other 6 slots are config.yaml-only.

Compression has TWO config blocks (corrected v0.15)

This is the number-one footgun — and the direction is the opposite of what older docs implied:

Top-level compression: — enabled, threshold, target_ratio, protect_last_n, protect_first_n. This block controls behavior/thresholds only, NOT the model.
auxiliary.compression: — provider, model, base_url, timeout (default 120s). This is where you set which model does the summarization — same shape as every other aux task. Defaults to auto → your main chat model.
The legacy keys compression.summary_provider / summary_model / summary_base_url are auto-migrated to auxiliary.compression.* on first load (config version 17). Don't author new configs with them.

Source: cli-config.yaml.example (compression block has no summary_* keys; comment points to auxiliary.compression) + website/docs/user-guide/configuration.md lines 646, 937.

When To Use

Default (auto): routes to your main chat model. Per configuration.md, auxiliary.*.provider: "auto" sends each side task to your main provider/model. Fine if your main is cheap/local; on an expensive main model (Opus, etc.) explicitly point aux tasks at a cheap model (e.g. Gemini Flash on OpenRouter) to avoid inflating cost.

Override when:

Your main model is expensive (Opus, GPT-5) and you're watching auxiliary usage inflate your bill. Route compression + curator + title_generation to Gemini Flash or a cheap Haiku/Kimi variant. (Note: session_search no longer costs anything — it's a free FTS5 tool since v0.15.)
Your main model isn't multimodal but you want vision. Point auxiliary.vision at GPT-4o or a dedicated vision model.
You're running airplane-mode locally and want all aux tasks on a local base_url (Ollama, vLLM, llama.cpp). See local stack playbook.
You want separation of duties — e.g. approval on a dedicated heavier model for better dangerous-command judgment.
Your main model is rate-limited and you want aux tasks on a different credential pool.

Risks & Pitfalls

Vision requires a multimodal model. Setting provider: "main" without checking multimodal capability breaks image analysis silently. Check before swapping.
The two-compression-blocks footgun. Swap the compression model under auxiliary.compression.provider/model — the same shape as every other aux task. The top-level compression: block is thresholds/behavior only; putting a model there (compression.summary_model) hits a legacy path that auto-migrates. (Older KB notes had this reversed.)
Credential gaps in auto. If the chain (OpenRouter → Nous → Codex) finds no credentials, aux calls fail and the feature that triggered them (e.g. vision, compression) fails too. (As of v0.15, layered fallback softens this — primary → chain → main agent → graceful fail.)
Slow local models + default 30s timeout. Vision on a local 8B multimodal may exceed 30s. Raise the timeout.
base_url overrides provider. If both are set, base_url wins and provider is ignored. Easy to forget when debugging.
Aux is not covered by the main fallback_model system. The main-agent fallback chain does not apply to aux calls. Each aux task does its own resolution.
Env var overrides only work for vision and web_extract. Don't try AUXILIARY_MCP_MODEL — it won't be read.

Related Concepts

model switching — the main-agent /model system; auxiliary is a parallel track
approval system — the approvals.mode: smart mode is powered by auxiliary.approval
mcp integration — the auxiliary.mcp slot helps dispatch MCP tool calls
memory system — memory writes are handled by the curator slot (the v0.12 successor to the old flush_memories)
skills system — auxiliary.skills_hub scores skill matches
local stack playbook — routing all 8 aux slots to a local base_url is the local-stack completion
version v0.8.0 — early aux behavior (402-fallback, vision auto-detection main-first)
version v0.15.0 — roster change (session_search → title_generation), register_auxiliary_task(), layered fallback

Sources

raw/docs-user-guide-configuration.md — primary reference for the config surface (§ Auxiliary Models)
raw/docs-developer-guide-provider-runtime.md — confirmation that auxiliary tasks use an independent resolution chain
raw/docs-developer-guide-adding-providers.md — _API_KEY_PROVIDER_AUX_MODELS table for per-provider default aux models
raw/docs-developer-guide-architecture.md — locates agent/auxiliary_client.py in the codebase
raw/docs-developer-guide-agent-loop.md — positions aux client in the run-agent call graph