cheat sheet

Codex Sub-Agents & Task Delegation

How Codex CLI supports delegating to child agents — the experimental /agent system, codex exec as a sub-agent pattern, MCP-mediated agent-to-agent calls, isolation boundaries, and orchestration patterns.

Codex Sub-Agents & Task Delegation

What it is

A "sub-agent" is a child Codex run spawned by a parent Codex session to handle an isolated task — a focused refactor, a parallel search, a sandboxed experiment. Unlike Claude Code's first-class Task tool (which is a single built-in tool with a well-known schema), Codex offers sub-agents through several composable mechanisms: the experimental /agent slash command and [[agents]] config tables, recursive codex exec invocations from inside the agent's shell tool, and MCP-mediated agent-to-agent calls when Codex itself is exposed as an MCP server. Each approach trades off ergonomics for isolation, and you mix-and-match depending on the task. This page maps the territory.


Why delegate

Sub-agents solve four problems that a single long-running session cannot:

  1. Context window pressure. A parent that has loaded 100K tokens of code can spawn a sub-agent with a tightly-scoped 5K-token prompt instead of degrading its own context.
  2. Parallelism. Several independent sub-agents can run concurrently — searching, summarising, refactoring different files.
  3. Isolation of failure. A sub-agent's blow-up (sandbox denial, runaway loop) does not poison the parent's session.
  4. Per-task policy. A read-only sub-agent can be spawned by a write-enabled parent (or vice versa), tightening the blast radius of risky work.

Approach 1 — /agent and [[agents]] (experimental)

Codex's first-class sub-agent surface is the experimental /agent slash command, backed by [[agents]] tables in config.toml. You declare named agent templates (each with its own model, profile, system prompt, and tool whitelist) and dispatch tasks to them from inside a session.

Enable the feature

toml
[features]
codex_agents = true

Output: (none — TOML config)

Declare an agent template

toml
[[agents]]
name           = "reviewer"
model          = "gpt-4o"
sandbox_mode   = "read-only"
approval_policy = "never"
system_prompt  = """
You are a careful code reviewer. Read the file at $1 and return a JSON array
of {file, line, severity, message} objects. Do not edit anything.
"""
enabled_tools  = ["read_file", "list_directory", "search_files"]

Output: (none — TOML config)

Dispatch from the TUI

text
/agent reviewer src/auth.py

Output (inline in TUI):

text
[reviewer:th_01xyz] Spawned. Working on src/auth.py…
[reviewer:th_01xyz] Done. Result:
[
  {"file":"src/auth.py","line":42,"severity":"high","message":"Token compared with == not secrets.compare_digest"},
  {"file":"src/auth.py","line":78,"severity":"medium","message":"SQL query not parameterised"}
]

List active sub-agents

text
/agent list

Output (inline in TUI):

text
th_01xyz  reviewer  running   "src/auth.py"
th_01abc  fixer     done      "ruff issues"

Wait for or interrupt a sub-agent

text
/agent wait th_01xyz
/agent stop th_01abc

Output (inline in TUI):

text
[reviewer:th_01xyz] Completed.
[fixer:th_01abc] Stopped.

Background vs. foreground

By default /agent <name> <task> blocks the parent until the sub-agent finishes. Append --bg to run in the background:

text
/agent reviewer --bg src/auth.py

Output (inline in TUI):

text
[reviewer:th_01xyz] Spawned in background. Use /agent wait to join.

Approach 2 — recursive codex exec

The most portable sub-agent pattern is to invoke codex exec from inside the parent agent's shell tool. The parent describes the task ("research X and tell me the answer"), the agent runs codex exec as a normal shell command, and the parent receives the sub-agent's last message as the shell tool's output. This pattern works on every Codex install without any feature flags.

Inside a session, ask the agent to delegate

text
I want you to research three potential ways to migrate src/db.py to async,
in parallel. Spawn three sub-agents via `codex exec`, each with a different
candidate approach, then summarise the trade-offs.

Output (inline in TUI):

text
[agent] Running: codex exec --output-last-message --sandbox read-only --ask-for-approval never "Research approach 1: asyncpg" &
[agent] Running: codex exec --output-last-message --sandbox read-only --ask-for-approval never "Research approach 2: SQLAlchemy 2.0 asyncio" &
[agent] Running: codex exec --output-last-message --sandbox read-only --ask-for-approval never "Research approach 3: Tortoise ORM" &
[agent] Wait for all three. Summarise.
[agent] Done. Trade-off summary:
- asyncpg: thinnest layer, max perf, no ORM
- SQLAlchemy 2.0: drop-in async, schema reuse, mature
- Tortoise:   pythonic, less mature, fewer integrations

Capture the sub-agent's exit code

bash
result=$(codex exec --output-last-message --timeout 60 \
  --sandbox read-only --ask-for-approval never \
  "Is the README in this repo up to date with the actual API?")
echo "exit=$?  reply=$result"

Output:

text
exit=0  reply=No — three new endpoints in src/api/v2.py are not documented.

Parent prompt to ask for delegation

The parent agent needs to know the delegation pattern. Encode it in AGENTS.md:

markdown
## Delegation

For large multi-file investigations, spawn sub-agents with:

  codex exec --output-last-message --sandbox read-only --ask-for-approval never "<prompt>"

Each call is isolated. Use it to keep this session's context small.

Output: (none — Markdown for AGENTS.md)


Approach 3 — MCP-mediated delegation

When Codex is launched as an MCP server (codex mcp serve), any other MCP client — including another Codex instance — can call it as a tool. This is the most structured sub-agent pattern: the call/response is governed by the MCP protocol, with typed inputs and outputs.

Launch the inner agent as an MCP server

bash
codex mcp serve

Output:

text
Codex MCP server listening on stdio
Exposed tools:
  codex_exec(prompt: string, sandbox: string, timeout: int) -> string
  codex_resume(session_id: string) -> string
  codex_apply(task_id: string) -> string

Register it as a tool in the parent config

toml
[mcp_servers.inner-codex]
command                     = "codex"
args                        = ["mcp", "serve"]
default_tools_approval_mode = "manual"   # prompt before each delegation

Output: (none — TOML config)

Parent agent calls the inner agent

From inside the parent's TUI session, just describe the task — Codex will see codex_exec as a tool and call it when delegation makes sense.

text
Delegate to the inner codex MCP: have it audit src/db.py for SQL injection.

Output (inline in TUI):

text
[agent] Calling inner-codex.codex_exec(prompt="Audit src/db.py for SQL injection. Return JSON.", sandbox="read-only", timeout=120)
[agent] Result: [{"file":"src/db.py","line":17,"issue":"unparameterised query"}]

Isolation boundaries

Each sub-agent approach gives different guarantees. Use this matrix to choose.

Boundary/agent (experimental)Recursive codex execMCP codex_exec
Separate session/historyyesyesyes
Separate context windowyesyesyes
Separate sandbox policyyes (per-agent template)yes (per-invocation flag)yes (per-call arg)
Separate modelyesyesyes
Separate AGENTS.md scopeinheritsinherits unless --cdinherits
Separate MCP serversconfigurableinherits parent configsub-agent process has its own
Parent sees structured outputyes (JSON)yes (stdout)yes (typed)
Parent can interruptyes (/agent stop)yes (Ctrl+C → SIGINT)partial (timeout only)
Available without feature flagnoyesyes

Parallelism patterns

A single Codex session can spawn many sub-agents at once. Three parallelism patterns appear in practice.

Fan-out, fan-in

Spawn N sub-agents in parallel, wait for all, summarise.

bash
# Inside the shell tool of a parent agent
( codex exec --output-last-message --sandbox read-only --ask-for-approval never "Audit src/auth.py" > /tmp/a.txt ) &
( codex exec --output-last-message --sandbox read-only --ask-for-approval never "Audit src/db.py"   > /tmp/b.txt ) &
( codex exec --output-last-message --sandbox read-only --ask-for-approval never "Audit src/api.py"  > /tmp/c.txt ) &
wait
cat /tmp/a.txt /tmp/b.txt /tmp/c.txt

Output:

text
[auth.py findings]
[db.py findings]
[api.py findings]

Pipeline

Each stage's output feeds the next stage's prompt.

bash
# Inside the shell tool of a parent agent
plan=$(codex exec --output-last-message --sandbox read-only --ask-for-approval never \
  "Outline a refactor of src/auth.py to use JWT cookies. Output: bullets only.")
codex exec --full-auto "Execute this plan: $plan"

Output:

text
[agent edits src/auth.py per the outlined plan]

Map-reduce

Map a prompt over many files, then reduce.

bash
# Inside the shell tool of a parent agent
for f in src/**/*.py; do
  ( codex exec --output-last-message --sandbox read-only --ask-for-approval never \
      "One-sentence summary of $f" > "/tmp/$(basename $f).sum" ) &
done
wait
cat /tmp/*.sum | codex exec --output-last-message \
  "Group these one-sentence summaries by domain and rank importance."

Output:

text
Auth domain:
  - src/auth.py        — token issuance and validation
  - src/middleware.py  — request authentication
DB domain:
  - src/db.py          — pool config and session helpers
…

Sub-agent prompts that work

A sub-agent does not see the parent's chat history. Its only context is the prompt you give it plus any AGENTS.md it auto-discovers from its cwd. Write the prompt as if you were briefing a new contractor — explicit goal, explicit output format, explicit constraints.

Template

text
Goal: <one sentence>.
Inputs: <files / data the sub-agent should look at>.
Constraints: <sandbox / time / scope>.
Output: <exact format — JSON, bullets, diff, etc.>.

Example — security review sub-agent

text
Goal: Find SQL-injection or auth-bypass vulnerabilities in src/auth.py.
Inputs: src/auth.py and any file it imports.
Constraints: Read-only. Do not edit. Time-box to 60 seconds.
Output: JSON array of {file, line, severity, message}.

Example — fixer sub-agent

text
Goal: Apply the smallest possible patch to make tests/test_auth.py pass.
Inputs: tests/test_auth.py + the failing test output below.
Constraints: workspace-write sandbox; do not modify any test file.
Output: After applying, print the unified diff of changes you made.

Limits and unsupported behaviour

Some things that sub-agents in Codex cannot do today. Many of these are deliberate; some are pending feature work.

  • No shared memory. Two sibling sub-agents cannot pass state to each other except via the filesystem.
  • No streaming results into the parent agent's prompt. Sub-agent output is appended to the parent's tool-result stream only after the sub-agent completes. The parent does not see partial output.
  • No nested /agent calls. A sub-agent spawned via /agent cannot itself spawn another via /agent (the feature flag is per-process). Use recursive codex exec for arbitrary depth.
  • /agent cannot target the OpenAI cloud. Sub-agents always run locally. Use codex cloud separately if you need remote execution.
  • Approval prompts in sub-agents are silent in non-interactive contexts. A sub-agent spawned in codex exec with --ask-for-approval on-request will fail-closed because there is no tty to prompt.
  • /agent stop is best-effort. A sub-agent currently inside a long shell command will not be interrupted until that command returns.

Cost and token accounting

Each sub-agent has its own conversation and consumes its own tokens. The parent sees only the sub-agent's final message in its tool-result stream — but the sub-agent's input + output tokens are billed to the same OPENAI account.

Inspect sub-agent token usage:

bash
codex sessions show th_01xyz --json | jq '.tokens'

Output:

text
{"input": 18432, "output": 4210, "cache_hits": 7, "cache_misses": 12}

Aggregate across a fan-out:

bash
for s in th_01xyz th_01abc th_01def; do
  codex sessions show "$s" --json | jq '.tokens'
done | jq -s 'reduce .[] as $t ({}; .input += $t.input | .output += $t.output)'

Output:

text
{"input": 54231, "output": 13412}

Comparison with Claude Code's Task tool

Codex's sub-agent surface is composable; Claude Code's is a single first-class tool. Each has trade-offs.

CapabilityCodexClaude Code
First-class APIpartial (/agent, experimental)yes (Task tool, stable)
Output schemaJSON via prompttyped
Parallel sub-agentsyes (shell & or /agent --bg)yes (single tool can fan out)
Sandbox per sub-agentyesinherits
Recursive sub-agentsyes (via codex exec)no (Task can't call Task)
Cancellationpartialfull (parent receives result-or-cancel)
Available without flagyes (via codex exec)yes (Task is always on)

The trade-off: Claude's Task is more ergonomic for one-off delegation; Codex's codex exec pattern is more flexible for pipelines and fan-out because it composes with any shell command.


Common pitfalls

  1. A sub-agent does NOT inherit the parent's chat history. It only sees its prompt + AGENTS.md. If your prompt references "the bug we discussed earlier," the sub-agent has no idea what you mean.

  2. A sub-agent's sandbox is independent of the parent's. A read-only parent can spawn a workspace-write sub-agent (and vice versa). Make sure you're explicit; the default is whatever config.toml says, not "inherit from parent."

  3. /agent <name> <task> is blocking by default. Long-running agents will freeze your TUI prompt. Use --bg and /agent list / /agent wait.

  4. Recursive codex exec inherits OPENAI_API_KEY from the parent process. Different keys per sub-agent require explicit env-var management (env OPENAI_API_KEY=... codex exec ...).

  5. Sub-agent failures don't bubble structured errors. The parent sees the sub-agent's stdout as a string. Use --output-format json and have the parent parse the result.

  6. Spawned MCP server sub-agents share the parent's MCP server processes — and conflict. If both parent and sub-agent declare [mcp_servers.filesystem], the sub-agent will fail to bind. Use distinct MCP server names or run sub-agent with CODEX_HOME pointed at a separate config.

  7. Background sub-agents do NOT save to history by default. Pass --save to the inner codex exec if you want to resume them later.

  8. /agent templates ignore the parent's profile. They use their own model + sandbox unless explicitly inheriting. Test before relying on overrides.

  9. A sub-agent that needs network in workspace-write sandbox fails silently. It can't pip install or hit external APIs without --allow-network or danger-full-access.

  10. Parent timeouts do NOT propagate to sub-agents. If the parent has a 60s --timeout, a sub-agent it spawns has the sub-agent's own timeout (or none). Set explicitly.


Real-world recipes

Two-agent review-and-fix loop

A read-only reviewer agent produces JSON findings; a workspace-write fixer agent applies the fixes. Each runs with the minimum permissions it needs.

bash
findings=$(codex exec --output-last-message --sandbox read-only --ask-for-approval never \
  "Find correctness bugs in src/auth.py. Output JSON array of {line, message}.")

echo "$findings" \
  | codex exec --full-auto \
      "Apply minimal fixes for these findings: $findings. Touch only src/auth.py."

Output:

text
[reviewer JSON findings]
[fixer applies the patches]

Triage 50 stale PRs in parallel

bash
gh pr list --limit 50 --json number,title --jq '.[].number' \
  | xargs -I{} -P 8 bash -c '
      gh pr diff {} | codex exec --output-last-message --sandbox read-only --ask-for-approval never \
        "Classify this PR as: ready / needs-work / abandon. One line only."
    '

Output:

text
[50 one-line classifications, 8 at a time]

Long-running explorer with progress reporting

A background sub-agent investigates a complex topic; the parent polls and surfaces interim results.

text
/agent explorer --bg "Map all the auth flows in this repo. Write progress to /tmp/explorer.log."
text
/agent wait

Output (inline in TUI):

text
[explorer:th_01xyz] Done. See /tmp/explorer.log.

Self-hosted MCP delegation pyramid

Layered Codex instances: a coordinator at the top, two specialised reviewers underneath. The coordinator dispatches by topic.

toml
# coordinator's ~/.codex/config.toml
[mcp_servers.security-reviewer]
command = "codex"
args    = ["mcp", "serve"]
env     = { CODEX_HOME = "/srv/codex-security" }

[mcp_servers.perf-reviewer]
command = "codex"
args    = ["mcp", "serve"]
env     = { CODEX_HOME = "/srv/codex-perf" }

Output: (none — TOML config)

The coordinator session sees both reviewers as MCP tools and routes tasks accordingly.

Cap sub-agent cost with --max-turns

A sub-agent that loops forever drains tokens. Always cap.

bash
codex exec --max-turns 6 --timeout 60 --output-last-message --sandbox read-only --ask-for-approval never \
  "Is the README up to date?"

Output:

text
[answer in at most 6 turns / 60 seconds]

Drop a sub-agent into a scratch dir

Isolate a sub-agent into a temporary directory so its filesystem effects are easy to inspect (or throw away).

bash
work=$(mktemp -d)
cp src/auth.py "$work/"
codex exec --cd "$work" --skip-git-repo-check --full-auto \
  "Rewrite auth.py to use JWT and write a CHANGES.md describing the rewrite."
diff -ru src/auth.py "$work/auth.py"

Output:

text
[diff between original and sub-agent's rewrite]

Two-pass refactor: planner then executor

bash
plan=$(codex exec --output-last-message --sandbox read-only --ask-for-approval never -p deep \
  "Plan a refactor of src/auth/ to use JWT cookies. Output: numbered steps.")

codex exec --full-auto -p sprint "Execute this plan exactly: $plan"

Output:

text
[plan from deep-thinking model]
[execution by faster model]

Cancellable streaming wrapper

A small bash function that spawns a sub-agent in the background and forwards Ctrl+C to it:

bash
function cx-bg() {
  codex exec --json "$@" &
  pid=$!
  trap "kill -INT $pid" INT
  wait "$pid"
  trap - INT
}

Output: (none — defines wrapper)

bash
cx-bg "Long investigation task"

Output: (NDJSON stream; Ctrl+C cancels cleanly)

Use a sub-agent as a Q&A oracle inside a script

bash
function ask() {
  codex exec --output-last-message --sandbox read-only --ask-for-approval never --timeout 30 "$@"
}

echo "Default branch: $(ask 'What is the default branch of this repo?')"
echo "Top language:   $(ask 'What is the most common language in this repo?')"

Output:

text
Default branch: main
Top language:   TypeScript