wikis / Hermes / wiki / concepts / auxiliary-models.md view as markdown
Auxiliary Models โ Dedicated Providers for 8 Built-in Side Tasks
Roster history (read first): The set of auxiliary tasks has changed across releases. As of v0.15.0 the eight built-in slots are
vision,compression,web_extract,approval,mcp,title_generation,skills_hub,curator. Earlier KB notes listedflush_memories(replaced bycuratorin v0.12) andsession_search(removed in v0.15 โ it is now a free, no-LLM FTS5 tool, not an aux-model task). Plugins can register additional custom aux tasks viaregister_auxiliary_task()(v0.15), so the set is no longer fixed at exactly eight.
Definition
Hermes routes a set of internal "side tasks" through a separate, user-configurable model path โ the auxiliary model system. Instead of burning your main agent's (expensive) model on every image analysis, web page summary, or context compression, you can point each task slot at its own provider and model โ typically a cheap/fast one. By default Hermes auto-detects the first available provider from a short chain and uses Gemini Flash for everything. There are eight built-in slots (below) plus any custom slots a plugin registers.
How It Works
When an auxiliary task fires (e.g. the agent opens an image, the context compressor kicks in, a skill match is scored), the main agent's runtime doesn't handle it. Control passes to agent/auxiliary_client.py, which looks at the auxiliary: block in ~/.hermes/config.yaml (or env var overrides) to resolve:
- Which provider to use (
provider: autowalks a chain; a named provider is used directly;base_urlshort-circuits both) - Which model to request (provider's default if unspecified; Gemini Flash in the
autopath) - Which auth to use (
api_key, orOPENAI_API_KEYas a fallback for custombase_url) - How long to wait (the
timeoutfield)
The resulting aux request runs on its own HTTP client, independent of the main conversational model โ so it cannot "steal" context from the main chat, nor does its usage count against the main model's rate limits.
The auto resolution chain
Per the docs: "Uses the first available provider (OpenRouter โ Nous โ Codex) with Gemini Flash." In practice the chain prefers aggregator providers first (cheap, many models available), then Nous Portal, then Codex OAuth. If the user has none of these configured and no base_url override, Hermes will fail the aux call.
Independence from main-agent routing
Per docs-developer-guide-provider-runtime.md: "Auxiliary tasks use their own independent provider auto-detection chain." They share the same underlying transport infrastructure (resolve_provider_client()) but maintain separate credential bundles, timeouts, and retry behavior.
Key Parameters
The 8 built-in auxiliary task slots (v0.15)
| Slot | What it does | Default timeout |
|---|---|---|
vision |
Image analysis (vision_analyze tool + browser screenshots) |
30s (+30s image download) |
web_extract |
Web page summarization + browser text extraction | 360s |
approval |
Dangerous command classifier (powers approvals.mode: smart) |
30s |
compression |
Summarizes middle turns of long conversations. Model/provider set HERE (auxiliary.compression.provider/model); the top-level compression: block holds thresholds only |
120s |
title_generation |
Auto-names sessions (the v0.15 slot that replaced session_search) |
30s |
skills_hub |
Skill matching and search | 30s |
mcp |
MCP tool dispatch helper | 30s |
curator |
Autonomous skill/memory maintenance (v0.12 swap-in for the old flush_memories) |
30s |
No longer aux:
session_searchwas a slot through v0.14 but was rebuilt in v0.15 into a single three-mode (discovery/scroll/browse) FTS5 tool with no LLM, no cost, ~20ms โ so it no longer consumes an auxiliary model. Don't bill it as an aux cost item.
Plugin-registered custom tasks (v0.15)
Plugins can add their own auxiliary tasks via PluginContext.register_auxiliary_task(key=..., display_name=..., description=..., defaults={...}). The eight built-in keys above are reserved (registering one raises ValueError). A custom task's defaults accept provider (default "auto"), model (default ""), timeout (default 60), and extra_body. This means the auxiliary set is extensible โ "8" is the built-in floor, not a hard ceiling.
Layered fallback (v0.15)
Auxiliary calls now degrade through a ladder on capacity errors (402 / 429 / connection): primary โ resolution chain โ main agent โ graceful fail. This is resilience, not routing โ there is still no per-tool/per-complexity rule engine.
The 5 config knobs per slot
auxiliary:
vision:
provider: "auto" # "auto" | "openrouter" | "nous" | "codex" | "copilot" | "anthropic" | "main" | "zai" | "kimi-coding" | "minimax" | <custom provider>
model: "" # e.g. "openai/gpt-4o", "google/gemini-2.5-flash" โ empty = provider default
base_url: "" # Custom OpenAI-compatible endpoint; overrides provider when set
api_key: "" # Key for base_url (falls back to OPENAI_API_KEY)
timeout: 30 # Seconds; raise for slow local models
download_timeout: 30 # vision only โ HTTP image download
The "main" special value
provider: "main" means "use whatever provider my main agent uses." It is valid only inside auxiliary:, compression:, and fallback_model: configs. Not valid at the top-level model.provider. Useful when you've set up a custom endpoint as your main model and want auxiliary tasks to share the same endpoint.
Env var overrides (legacy, vision + web_extract only)
AUXILIARY_VISION_PROVIDER
AUXILIARY_VISION_MODEL
AUXILIARY_VISION_BASE_URL
AUXILIARY_VISION_API_KEY
AUXILIARY_WEB_EXTRACT_PROVIDER
AUXILIARY_WEB_EXTRACT_MODEL
AUXILIARY_WEB_EXTRACT_BASE_URL
AUXILIARY_WEB_EXTRACT_API_KEY
The other 6 slots are config.yaml-only.
Compression has TWO config blocks (corrected v0.15)
This is the number-one footgun โ and the direction is the opposite of what older docs implied:
- Top-level
compression:โenabled,threshold,target_ratio,protect_last_n,protect_first_n. This block controls behavior/thresholds only, NOT the model. auxiliary.compression:โprovider,model,base_url,timeout(default 120s). This is where you set which model does the summarization โ same shape as every other aux task. Defaults toautoโ your main chat model.- The legacy keys
compression.summary_provider/summary_model/summary_base_urlare auto-migrated toauxiliary.compression.*on first load (config version 17). Don't author new configs with them.
Source: cli-config.yaml.example (compression block has no summary_* keys; comment points to auxiliary.compression) + website/docs/user-guide/configuration.md lines 646, 937.
When To Use
Default (auto): routes to your main chat model. Per configuration.md, auxiliary.*.provider: "auto" sends each side task to your main provider/model. Fine if your main is cheap/local; on an expensive main model (Opus, etc.) explicitly point aux tasks at a cheap model (e.g. Gemini Flash on OpenRouter) to avoid inflating cost.
Override when:
- Your main model is expensive (Opus, GPT-5) and you're watching auxiliary usage inflate your bill. Route
compression+curator+title_generationto Gemini Flash or a cheap Haiku/Kimi variant. (Note:session_searchno longer costs anything โ it's a free FTS5 tool since v0.15.) - Your main model isn't multimodal but you want vision. Point
auxiliary.visionat GPT-4o or a dedicated vision model. - You're running airplane-mode locally and want all aux tasks on a local
base_url(Ollama, vLLM, llama.cpp). See local stack playbook. - You want separation of duties โ e.g.
approvalon a dedicated heavier model for better dangerous-command judgment. - Your main model is rate-limited and you want aux tasks on a different credential pool.
Risks & Pitfalls
- Vision requires a multimodal model. Setting
provider: "main"without checking multimodal capability breaks image analysis silently. Check before swapping. - The two-compression-blocks footgun. Swap the compression model under
auxiliary.compression.provider/modelโ the same shape as every other aux task. The top-levelcompression:block is thresholds/behavior only; putting a model there (compression.summary_model) hits a legacy path that auto-migrates. (Older KB notes had this reversed.) - Credential gaps in
auto. If the chain (OpenRouter โ Nous โ Codex) finds no credentials, aux calls fail and the feature that triggered them (e.g. vision, compression) fails too. (As of v0.15, layered fallback softens this โ primary โ chain โ main agent โ graceful fail.) - Slow local models + default 30s timeout. Vision on a local 8B multimodal may exceed 30s. Raise the
timeout. base_urloverrides provider. If both are set,base_urlwins andprovideris ignored. Easy to forget when debugging.- Aux is not covered by the main
fallback_modelsystem. The main-agent fallback chain does not apply to aux calls. Each aux task does its own resolution. - Env var overrides only work for vision and web_extract. Don't try
AUXILIARY_MCP_MODELโ it won't be read.
Related Concepts
- model switching โ the main-agent
/modelsystem; auxiliary is a parallel track - approval system โ the
approvals.mode: smartmode is powered byauxiliary.approval - mcp integration โ the
auxiliary.mcpslot helps dispatch MCP tool calls - memory system โ memory writes are handled by the
curatorslot (the v0.12 successor to the oldflush_memories) - skills system โ
auxiliary.skills_hubscores skill matches - local stack playbook โ routing all 8 aux slots to a local
base_urlis the local-stack completion - version v0.8.0 โ early aux behavior (402-fallback, vision auto-detection main-first)
- version v0.15.0 โ roster change (session_search โ title_generation),
register_auxiliary_task(), layered fallback
Sources
raw/docs-user-guide-configuration.mdโ primary reference for the config surface (ยง Auxiliary Models)raw/docs-developer-guide-provider-runtime.mdโ confirmation that auxiliary tasks use an independent resolution chainraw/docs-developer-guide-adding-providers.mdโ_API_KEY_PROVIDER_AUX_MODELStable for per-provider default aux modelsraw/docs-developer-guide-architecture.mdโ locatesagent/auxiliary_client.pyin the codebaseraw/docs-developer-guide-agent-loop.mdโ positions aux client in the run-agent call graph
