wikis / Hermes / wiki / concepts / self-improvement-loop.md view as markdown

Self-Improvement Loop (GAPA)

type: conceptconfidence: mediumupdated: 2026-04-12hermes_version: v0.8.0

Definition

The self-improvement loop is Hermes' headline differentiator: a closed cycle in which the agent observes its own behavior, identifies what failed, packages what worked into reusable skills, and edits its own prompts and memory — all without human intervention. The mechanism is called GAPA ("works like back propagation but for prompts instead of model weights"), it triggers roughly every 15 tool calls, and it is what the Hermes 24/7 Self-Evolving video means by "it gets better the more you use it with no fine-tuning or prompt engineering needed." It is the mechanical instantiation of Nous Research's bet that "an AI agent learns from its interactions and becomes more valuable over time" — the architectural choice that "fundamentally different from ChatGPT or Claude or Open Claude. They reset every time. Hermes it is designed to be getting better."

How It Works

The loop is composed of five stages that fire continuously during normal use. They are not a separate "training mode" — they run in the background of every conversation.

1. Trajectory Capture. Hermes records every API call, every tool decision, every output, in order. This trajectory is saved to disk and queryable. Most agents discard this state at session end; Hermes keeps it ("Hermes will record everything that it had just done... that full record it is what they call a trajectory. So most agent frameworks they just throw that away completely... Hermes keeps it instead"). See ml research pipeline for the trajectory subsystem.

2. GAPA Review (~every 15 tool calls). The agent pauses, reads back the recent trajectory, and asks itself what worked and what failed. The 24/7 Self-Evolving transcript: "with every 15 or so tool calls, it pauses, reviews what happened, figures out what failed, and updates itself. This system is called GAPA. Works like back propagation but for prompts instead of model weights." The cadence is configurable as skills.creation_nudge_interval (default 15) in config.yaml. The output is a set of edits to the agent's own working prompts and memory — the "backprop for prompts" framing is literal: gradient-equivalent updates without touching model weights.

3. Autonomous Skill Creation. When the trajectory analysis concludes the work is reusable, Hermes packages it as a SKILL.md document (see skills system) — frontmatter, body, sometimes scripts — and writes it to ~/.hermes/skills/. The transcript: "if it solves something, fixes an error, or you tell it to remember a task, it turns that into a reusable skill and uses it later when it needs to." Concrete examples from community videos:

A Hacker News morning briefing: "Now, it is capturing everything it learned and turning it into a reusable procedure. So, you can see here, 'Hacker News daily AI briefing skill created.' Now, it's updating the cron job to reference this new skill." Single complex prompt → autonomous skill + cron wired together.
A manim animation skill: "it used a manim skill to turn a complex technical concept into an animated video. Meaning it can now explain things visually, not just through text."
An X-posting workflow that "writes inside its skill MD file all the feedback that you give it, and then it improves it next time, right?"

4. Persistent Pattern-Based Memory. Beyond skills, Hermes maintains a multi-layer memory:

Cross-session FTS5 search — every prior conversation is indexed, queryable, and summarizable on demand. "Six months from now you can ask something as simple as like haven't we solved something like this in the past and it should be able to find that. It'll know, it will actually be able to remember."
Honcho user model — "Hermes does not just remember what you said, it builds a model of who you are, your work style, your preferences, your domain knowledge, and that model deepens over time."
Memory nudges — "It can suggest like you probably want a skill that's going to do X instead of Y. So you can approve it, you can reject it." See memory system.

5. Compounding Reuse + Refinement. Next time you ask for something similar, Hermes runs the existing skill, refines it on new feedback, and ratchets capability upward. "These skills, they are very alive and inside of your code. They're able to learn from each execution... over time, like your Hermes, it becomes a completely different tool than somebody else's Hermes. Like it's shaped by your workflows, your preferences, your patterns."

The v0.8.0 self-diagnosis case study. In April 2026, the Hermes team shipped a feature flagged in the v0.8.0 release notes as "Self-Optimized GPT/Codex Tool-Use Guidance — The agent diagnosed and patched 5 failure modes in GPT and Codex tool calling through automated behavioral benchmarking." This is the loop turned on itself: Hermes ran trajectories against GPT/Codex backends, GAPA-style review identified five distinct failure modes in how those models invoked tools, and the agent generated patches to its own provider-specific guidance — without a human in the loop. It is the most concrete public demonstration that GAPA works on the harness itself, not just on user-level skills.

Key Parameters

config.yaml:

skills:
  creation_nudge_interval: 15      # GAPA review cadence (tool calls)
  external_dirs: []
memory:
  memory_enabled: true
  user_profile_enabled: true       # Honcho-style user modeling
  memory_char_limit: 2200
  user_char_limit: 1375
  nudge_interval: 10               # memory-suggestion cadence
  flush_min_turns: 6
  provider: honcho                 # or hindsight, holographic, etc.
agent:
  max_turns: 60

Slash commands:

/skills — browse, install, inspect, create
/memory — view what Hermes thinks it knows about you
/yolo — accept dangerous commands without prompt (so the loop runs uninterrupted)

Files written by the loop:

~/.hermes/skills/custom/<slug>/SKILL.md — autonomously created skills
~/.hermes/sessions/sessions.json + state.db — trajectory + FTS5 index
~/.hermes/MEMORY.md and USER.md — markdown-readable memory

When To Use

You will use the agent more than once. If you need a one-shot answer, the loop is overhead. The value compounds — the entire pitch is "compounding for you because it's getting better. It's going to get more tailored towards you."
You have repeatable workflows. Daily briefings, content pipelines, code reviews, research digests — anything that recurs becomes a skill you didn't write.
You want to differentiate from stateless agents. Compared to ChatGPT, Claude, OpenClaw — all of which "reset every time" — Hermes is the only mainstream agent harness with this built in. The "Switching to Hermes" video frames it as the architectural-bet difference.
You are researching agent self-improvement. Trajectories + GAPA review logs are first-class data for study.
You want an "AI second brain" that learns who you are over months, not just within one session.

Risks & Pitfalls

Skills only fire on complex tasks. The Switching transcript warns: "in practice, like it only creates skills for very complex tasks. So, if you give it something simple, it is not going to be learning." Trivial requests bypass the loop entirely.
LLM-generated skills can be wrong. "The skill that it creates, it is LLM generated. So, it's not actually going to be guaranteed to work properly for you. So, you get a skill, you run it, it fails." Always inspect new skills (hermes skills inspect <name>) before relying on them in cron.
First 7 days are rough. "Be prepared for the first 7 days, it won't be what you want. It won't do anything perfectly. But, the point is it gets better every time." Premature judgment kills adoption.
Context bloat from memory. Long sessions accumulate user model + memory snippets that can blow context windows on smaller local models. The Gemma transcript: "it's actually very important to constantly flush that context as you work on a new task." Tune session_reset.mode: both and idle_minutes.
Stuck-loop bug. Multiple transcripts mention agents getting into iteration loops that ignore user input. v0.8.0's inactivity-based timeout (instead of wall-clock) helps, but the issue still recurs ("the V.4.0 change log, it has multiple fixed stuck agent loop entries, but it's one of those things that just keeps coming back in different forms").
The loop runs locally — security implications. GAPA edits your prompts and writes new skills to disk autonomously. If you run with terminal: backend: local and /yolo, the agent has full shell access while self-modifying. Use backend docker for production.
Auto-created skills can conflict. Two related tasks may produce overlapping skill names; the slug normalizer handles collisions but you can end up with morning-briefing and morning-briefing-1 both half-working. Periodic skill audit (hermes skills audit) recommended.
Confidence: medium for this page — much of the GAPA narrative comes from community video transcripts (Nick, Julian, the 24/7 Self-Evolving channel) rather than the canonical repo docs. The 15-tool-call interval and "back propagation for prompts" framing are widely repeated but should be verified against the source code in agent/ for academic citation.

Related Concepts

skills system — the substrate the loop writes into
memory system — Honcho user modeling, FTS5 session search
ml research pipeline — trajectories that feed GAPA also feed offline RL training
migration from openclaw — the headline "stateless OpenClaw vs. learning Hermes" framing
hermes self evolution
version v0.8.0 — the GPT/Codex self-diagnosis release
hermes vs openclaw — full architectural comparison

Sources

raw/transcript-247-self-evolving.txt — the canonical GAPA / 15-tool-call / "back propagation for prompts" framing, manim skill example
raw/transcript-switching-to-hermes.txt — full mechanical walk-through of the trajectory → skill → reuse loop, "becomes a completely different tool than somebody else's Hermes"
raw/transcript-did-hermes-kill-openclaw.txt — Hacker News briefing skill auto-creation example, transparency of the loop
raw/transcript-hermes-full-course-2hr.txt — practical feedback loop, X-posting skill self-improvement, compounding effect across multiple skills
raw/03-skills-system.md — skills.creation_nudge_interval, autonomous skill writing, frontmatter schema
raw/release-v0.8.0.md — v0.8.0 GPT/Codex self-diagnosis as "the agent diagnosed and patched 5 failure modes in GPT and Codex tool calling through automated behavioral benchmarking"