# Codex — full corpus # LLM Wiki An open-source template for building LLM-powered knowledge bases, following [Andrej Karpathy's "LLM Wiki" pattern](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f). You provide raw sources. The LLM reads them, writes structured wiki pages, cross-links everything, and maintains it over time. You never edit the wiki directly — you curate sources and ask questions. ## How It Works The system has three layers: ``` raw/ Sources you collect (articles, transcripts, notes, PDFs) wiki/ LLM-written & maintained pages (summaries, concepts, entities, syntheses) CLAUDE.md Schema that tells the LLM how to structure everything ``` Three operations drive the workflow: | Operation | Trigger | What happens | |-----------|---------|--------------| | **Ingest** | "ingest raw/my-source.txt" | LLM reads the source, creates a summary page, creates/updates concept and entity pages, adds cross-links, updates the index and log | | **Query** | Ask any question | LLM searches the wiki, synthesizes an answer with citations, optionally creates a synthesis page for novel insights | | **Lint** | "lint" or "health check" | LLM audits all pages for orphans, contradictions, missing links, incomplete sections, and low-confidence claims — fixes what it can, reports the rest | ## Quick Start 1. **Clone this repo** ```bash git clone https://github.com/YOUR_USERNAME/llm-wiki.git my-knowledge-base cd my-knowledge-base ``` 2. **Customize CLAUDE.md** for your domain - Update the Purpose section with your topic - Replace the placeholder tagging taxonomy with your own categories - Adjust confidence level descriptions if needed - Everything else (workflows, page formats, linking rules) works as-is 3. **Drop sources into `raw/`** - Text files, transcripts, articles, notes — any plain text - These are immutable once added; the LLM never modifies them 4. **Tell the LLM to ingest** ``` ingest raw/my-first-source.txt ``` The LLM will create summary pages, concept pages, entity pages, cross-links, and update the index. 5. **Ask questions** ``` What are the key differences between X and Y? ``` The LLM answers from the wiki, citing specific pages. 6. **Run health checks** ``` lint ``` The LLM audits the wiki and fixes issues. ## Directory Structure ``` . ├── CLAUDE.md # Schema — the LLM's instructions ├── raw/ # Your source documents (immutable) └── wiki/ ├── index.md # Master catalog of all pages ├── log.md # Append-only activity log ├── dashboard.md # Dataview dashboard (Obsidian) ├── analytics.md # Charts View analytics (Obsidian) ├── flashcards.md # Spaced repetition cards ├── summaries/ # One page per source document ├── concepts/ # Concept and framework pages ├── entities/ # People, tools, organizations, etc. ├── syntheses/ # Cross-cutting analyses and comparisons ├── journal/ # Research/session journal entries │ └── template.md # Journal entry template └── presentations/ # Marp slide decks ``` ## Enhancements This template includes several extras beyond the core wiki pattern: ### Dataview Dashboard (`wiki/dashboard.md`) Live queries that surface low-confidence pages, recent updates, concepts by tag, and pages with the most sources. Requires the [Dataview](https://github.com/blacksmithgu/obsidian-dataview) Obsidian plugin. ### Charts View Analytics (`wiki/analytics.md`) Visual analytics with pie charts, bar charts, and word clouds. Requires the [Charts View](https://github.com/caronchen/obsidian-chartsview-plugin) Obsidian plugin. ### Mermaid Diagrams Use Mermaid code blocks in any wiki page to create flowcharts, sequence diagrams, or concept maps. Native support in Obsidian and GitHub. ### Marp Slides (`wiki/presentations/`) Create slide decks from markdown using [Marp](https://marp.app/). Drop presentation files in this directory. ### Research Journal (`wiki/journal/`) Track your research sessions, experiments, or applied work with the included template. The LLM can reference journal entries when answering queries. ### Spaced Repetition (`wiki/flashcards.md`) Flashcards in the format used by the [Spaced Repetition](https://github.com/st3v3nmw/obsidian-spaced-repetition) Obsidian plugin. Ask the LLM to generate flashcards from any wiki page. ### MCP Server This repo works with Claude Code's MCP server capabilities. Point an MCP-compatible client at this repo and the LLM can read/write the wiki programmatically. ## Customizing for Your Domain The schema in `CLAUDE.md` is domain-agnostic. To adapt it: 1. **Purpose** — Describe your knowledge domain in one paragraph 2. **Tagging taxonomy** — Replace placeholder categories with your own (e.g., for a cooking KB: `cuisine`, `technique`, `ingredient`, `equipment`) 3. **Confidence levels** — Adjust the descriptions to match your domain's evidence standards 4. **Entity types** — Update the entity page description to match what entities mean in your domain (people, tools, companies, etc.) 5. **Journal template** — Customize `wiki/journal/template.md` for your workflow Everything else — page format, linking conventions, workflows, rules — is universal and works across domains. ## Example Domains This template works for any knowledge-intensive topic: - **Research notes** — papers, experiments, methodologies - **Book analysis** — themes, characters, author techniques - **Competitive analysis** — companies, products, market trends - **Course notes** — lectures, readings, key concepts - **Personal development** — frameworks, habits, book summaries - **Technical documentation** — APIs, architectures, design patterns - **Hobby deep-dives** — any subject you want to master ## License MIT --- title: "Codex KB — Master Index" type: index updated: 2026-06-11 codex_version: "0.139.0" --- # Codex KB — Master Index Master catalog of all wiki pages. Every page in the wiki must have an entry here. **Latest verified Codex version:** CLI 0.139.0 stable (2026-06-09); pre-release 0.140.0-alpha.4 **KB pages:** 35 (13 concepts + 9 entities + 6 summaries + 5 syntheses + 2 system) ## Concepts (13) ### Getting started & daily use - [[concepts/installation-setup]] — install per surface and platform (incl. Windows paths), CODEX_HOME, diagnostics - [[concepts/authentication]] — Sign in with ChatGPT vs API key, access tokens, headless login - [[concepts/configuration]] — config.toml layers and precedence, profiles, env vars, feature flags - [[concepts/agents-md]] — AGENTS.md discovery/merging, overrides, custom prompts, rules + execpolicy - [[concepts/sandboxing-approvals]] — sandbox modes × approval policies, permission profiles, network controls - [[concepts/memories-context]] — memories opt-in and storage, Chronicle, compaction and /fork - [[concepts/non-interactive-exec]] — codex exec for scripts and CI: stdin, JSONL, output schemas ### Scaling & automating - [[concepts/cloud-tasks]] — hosted cloud tasks, environments, worktrees, remote connections - [[concepts/automations]] — scheduled/recurring automations and their execution modes - [[concepts/mcp-integration]] — MCP servers in config.toml, transports, OAuth, tool policies - [[concepts/subagents]] — built-in and custom subagents, orchestration limits - [[concepts/skills-plugins]] — skills (progressive disclosure, scopes) + plugins and marketplaces - [[concepts/enterprise-admin]] — requirements.toml vs managed_config.toml, RBAC, Codex Security, compliance ## Entities (9) - [[entities/codex-app]] — desktop app: thread modes, shortcuts, settings, app server - [[entities/codex-cli]] — the CLI: flags, slash commands, fast mode, cloud commands - [[entities/codex-ide-extension]] — VS Code/Cursor/Windsurf + JetBrains: modes, commands, settings - [[entities/codex-web]] — Codex cloud on the web + the Sites plugin - [[entities/browser-integration]] — in-app browser, Chrome extension, computer use, Appshots - [[entities/github-integrations]] — @codex review, the GitHub Action, auto-review, OSS fund - [[entities/codex-sdk]] — TypeScript/Python SDKs, codex mcp-server, Agents SDK interop - [[entities/codex-models]] — gpt-5.5/5.4/5.3-codex family, spark, deprecations, Bedrock option - [[entities/chat-integrations]] — Codex in Slack and Linear ## Summaries (6) - [[summaries/release-digest]] — 0.138.0/0.139.0 digest; release cadence and alpha-stub caveat - [[summaries/casebook-auth-limits]] — solved cases: 401s, Windows sign-in, plan limits, metering anomalies - [[summaries/casebook-runtime]] — solved cases: stream disconnects, hangs, bwrap approval spam, model routing, worktree handoff regression - [[summaries/best-practices-prompting]] — prompting skeleton, plan/goal modes, escalation ladder - [[summaries/community-source-batch-2026-06-11]] — prepared community-source ingest packet for limits, memories, workflows, and Windows sandbox field reports - [[summaries/field-notes-windows-app]] — field-verified Windows app behavior: handoff missing, conditional worktree carry-over, permissions selector labels, surface-specific /status ## Syntheses (5) - [[syntheses/surface-picker]] — app vs CLI vs IDE vs web vs cloud: pick by use case - [[syntheses/sandbox-approval-guide]] — sandbox × approval matrix with verbatim config.toml presets - [[syntheses/auth-plan-picker]] — ChatGPT plans vs API key: capabilities, limits, pricing - [[syntheses/workflow-recipes]] — worktrees + handoff, AGENTS.md layering, automation patterns - [[syntheses/troubleshooting-checklist]] — symptom router + 10-step ordered sequence ## Gaps / TODO - Codex ships stable minors every 1–2 days — re-verify [[summaries/release-digest]] and bump `codex_version` each ingest; alpha releases carry no notes (don't re-fetch expecting content). - The 2026 usage-metering anomaly and phone-verification loop were unresolved at fetch time (low-confidence in [[summaries/casebook-auth-limits]]) — refresh next ingest. - Cloud-task pricing/credit consumption and concurrent-task limits not documented in fetched sources. - `raw/llms_txt_doc-faq.md` is the Codex *Security* FAQ, not a general product FAQ — don't cite it for auth/plans. - Community-source batch prepared on 2026-06-11; next ingest should process [[summaries/community-source-batch-2026-06-11]] source clusters before treating community claims as KB facts. - **Handoff regression watch:** the Hand off control is missing from worktree threads (#14141 closed-unresolved; #15314 open) while official docs still describe the old flow — re-verify on each app release and update [[summaries/casebook-runtime]] E1–E2 / [[summaries/field-notes-windows-app]] when it returns. - Field-note caveat: 2026-06-11 Windows observations are from a single machine and the app build number was not recorded — recapture version next session. ## Statistics - **Total pages**: 35 - **Concepts**: 13 - **Entities**: 9 - **Summaries**: 6 - **Syntheses**: 5 --- title: "AGENTS.md Custom Instructions" type: concept tags: [agents-md, custom-instructions, rules, custom-prompts, project-guidance] created: 2026-06-10 updated: 2026-06-10 confidence: high sources: ["raw/llms_txt_doc-custom-instructions-with-agents-md.md", "raw/github_doc-docs-agents-md-md.md", "raw/llms_txt_doc-custom-prompts.md", "raw/llms_txt_doc-rules.md"] --- # AGENTS.md Custom Instructions ## Definition `AGENTS.md` files are plain-Markdown instruction files Codex reads before doing any work. Layering a global file (in the Codex home) with project and subdirectory files gives every task consistent expectations, regardless of which repository you open. Related customization mechanisms covered here: **custom prompts** (deprecated slash-command Markdown files) and **rules** (experimental allow/prompt/forbid policies for commands outside the sandbox). ## How It Works ### Discovery order Codex builds the instruction chain once per run (in the TUI, once per launched session): 1. **Global scope**: in `CODEX_HOME` (default `~/.codex`), Codex reads `AGENTS.override.md` if present, otherwise `AGENTS.md` — only the first non-empty file. 2. **Project scope**: starting at the project root (typically the Git root; configurable via `project_root_markers`), Codex walks down to the current working directory. In each directory it checks `AGENTS.override.md`, then `AGENTS.md`, then any names in `project_doc_fallback_filenames`. At most one file per directory. If no project root is found, only the current directory is checked. 3. **Merge order**: files are concatenated root-down, joined by blank lines. Files closer to your current directory appear later in the prompt, so they override broader guidance. Empty files are skipped, and Codex stops adding files once the combined size hits `project_doc_max_bytes` (32 KiB default). Raise the limit or split instructions across nested directories when you hit the cap. Both knobs live in [[concepts/configuration]]: ```toml # ~/.codex/config.toml project_doc_fallback_filenames = ["TEAM_GUIDE.md", ".agents.md"] project_doc_max_bytes = 65536 ``` With that fallback list, per-directory check order becomes: `AGENTS.override.md`, `AGENTS.md`, `TEAM_GUIDE.md`, `.agents.md`. The open-source repo also documents a `child_agents_md` feature flag (`[features]` in `config.toml`) that appends extra guidance about AGENTS.md scope/precedence to the user instructions message, emitted even when no AGENTS.md exists. ### Format and scoping in practice Global working agreements go in `~/.codex/AGENTS.md`; use `~/.codex/AGENTS.override.md` for a temporary global override without deleting the base file. Repository norms go in the repo-root `AGENTS.md`; team- or service-specific overrides go in nested directories, e.g. `services/payments/AGENTS.override.md` (a sibling `AGENTS.md` in that directory is then ignored). Content is ordinary Markdown — short imperative bullets work best: ```md # AGENTS.md ## Repository expectations - Run `npm run lint` before opening a pull request. - Document public utilities in `docs/` when you change behavior. ``` Verify the chain loads as expected: ```bash codex --ask-for-approval never "Summarize the current instructions." codex --cd services/payments --ask-for-approval never "List the instruction sources you loaded." ``` To audit what loaded, enable a plaintext log with `codex -c log_dir=./.codex-log` and check `codex-tui.log`. Keeping AGENTS.md small also conserves usage limits — every byte is injected into the first turn ([[summaries/best-practices-prompting]]). ## Key Parameters - `project_doc_max_bytes` — combined AGENTS.md byte cap (default 32 KiB); Codex stops adding files once it's hit. - `project_doc_fallback_filenames` — extra per-directory filenames checked after `AGENTS.override.md` and `AGENTS.md`. - `project_root_markers` — controls where Codex anchors the project root for the discovery walk. - `CODEX_HOME` — home whose `AGENTS.md` / `AGENTS.override.md` provide global scope (default `~/.codex`). - `[features] child_agents_md` — flag that appends extra AGENTS.md scope/precedence guidance to the user instructions message. - `prefix_rule(pattern, decision, justification, match, not_match)` — Starlark rule fields; `decision` is `allow` (default), `prompt`, or `forbidden`. - `~/.codex/rules/default.rules` — where TUI allow-listing and smart approvals write rules. - `codex execpolicy check --pretty --rules -- ` — rule test harness. - `codex -c log_dir=./.codex-log` — plaintext log for auditing which instruction files loaded. ## Custom prompts (deprecated) Custom prompts turned Markdown files under `~/.codex/prompts/` into slash commands (`/prompts:draftpr`) in the CLI and IDE extension. They are **deprecated — use skills instead** ([[concepts/skills-plugins]]), which can be shared via the repo and invoked implicitly. Mechanics, for existing users: - One Markdown file per prompt, directly under `~/.codex/prompts/` (no subdirectories); restart Codex after edits. - YAML frontmatter: `description:` (shown in the popup) and `argument-hint: KEY=`. - Placeholders: positional `$1`–`$9`, `$ARGUMENTS` for all, named uppercase like `$FILES` supplied as `KEY=value` (quote values with spaces), `$$` for a literal `$`. - Invocation: `/prompts:draftpr FILES="src/pages/index.astro" PR_TITLE="Add hero animation"`. ## Rules (command policies) Rules control which commands Codex may run **outside the sandbox** — they complement, not replace, AGENTS.md guidance. Experimental. Create `.rules` files (Starlark syntax) under a `rules/` folder next to any active config layer, e.g. `~/.codex/rules/default.rules`: ```python prefix_rule( pattern = ["gh", "pr", "view"], decision = "prompt", # allow (default) | prompt | forbidden justification = "Viewing PRs is allowed with approval", match = ["gh pr view 7888"], not_match = ["gh pr --repo openai/codex view 7888"], ) ``` Key behaviors: - `pattern` matches an exact argument-list prefix; elements can be literals or unions (`["view", "list"]`). Most restrictive decision wins when rules overlap (`forbidden` > `prompt` > `allow`). - `match`/`not_match` act as inline unit tests validated at load time. - Codex safely splits simple `bash -lc` chains (plain words joined by `&&`, `||`, `;`, `|`) and evaluates each command separately — `git add . && rm -rf /` is never auto-allowed by a `git add` rule. Scripts with redirection, substitution, env assignments, wildcards, or control flow are evaluated as a single `["bash", "-lc", "