cheat sheet
Codex Sub-Agents & Task Delegation
How Codex CLI supports delegating to child agents — the experimental /agent system, codex exec as a sub-agent pattern, MCP-mediated agent-to-agent calls, isolation boundaries, and orchestration patterns.
Codex Sub-Agents & Task Delegation
What it is
A "sub-agent" is a child Codex run spawned by a parent Codex session to handle an isolated task — a focused refactor, a parallel search, a sandboxed experiment. Unlike Claude Code's first-class Task tool (which is a single built-in tool with a well-known schema), Codex offers sub-agents through several composable mechanisms: the experimental /agent slash command and [[agents]] config tables, recursive codex exec invocations from inside the agent's shell tool, and MCP-mediated agent-to-agent calls when Codex itself is exposed as an MCP server. Each approach trades off ergonomics for isolation, and you mix-and-match depending on the task. This page maps the territory.
Why delegate
Sub-agents solve four problems that a single long-running session cannot:
- Context window pressure. A parent that has loaded 100K tokens of code can spawn a sub-agent with a tightly-scoped 5K-token prompt instead of degrading its own context.
- Parallelism. Several independent sub-agents can run concurrently — searching, summarising, refactoring different files.
- Isolation of failure. A sub-agent's blow-up (sandbox denial, runaway loop) does not poison the parent's session.
- Per-task policy. A read-only sub-agent can be spawned by a write-enabled parent (or vice versa), tightening the blast radius of risky work.
Approach 1 — /agent and [[agents]] (experimental)
Codex's first-class sub-agent surface is the experimental /agent slash command, backed by [[agents]] tables in config.toml. You declare named agent templates (each with its own model, profile, system prompt, and tool whitelist) and dispatch tasks to them from inside a session.
Enable the feature
[features]
codex_agents = true
Output: (none — TOML config)
Declare an agent template
[[agents]]
name = "reviewer"
model = "gpt-4o"
sandbox_mode = "read-only"
approval_policy = "never"
system_prompt = """
You are a careful code reviewer. Read the file at $1 and return a JSON array
of {file, line, severity, message} objects. Do not edit anything.
"""
enabled_tools = ["read_file", "list_directory", "search_files"]
Output: (none — TOML config)
Dispatch from the TUI
/agent reviewer src/auth.py
Output (inline in TUI):
[reviewer:th_01xyz] Spawned. Working on src/auth.py…
[reviewer:th_01xyz] Done. Result:
[
{"file":"src/auth.py","line":42,"severity":"high","message":"Token compared with == not secrets.compare_digest"},
{"file":"src/auth.py","line":78,"severity":"medium","message":"SQL query not parameterised"}
]
List active sub-agents
/agent list
Output (inline in TUI):
th_01xyz reviewer running "src/auth.py"
th_01abc fixer done "ruff issues"
Wait for or interrupt a sub-agent
/agent wait th_01xyz
/agent stop th_01abc
Output (inline in TUI):
[reviewer:th_01xyz] Completed.
[fixer:th_01abc] Stopped.
Background vs. foreground
By default /agent <name> <task> blocks the parent until the sub-agent finishes. Append --bg to run in the background:
/agent reviewer --bg src/auth.py
Output (inline in TUI):
[reviewer:th_01xyz] Spawned in background. Use /agent wait to join.
Approach 2 — recursive codex exec
The most portable sub-agent pattern is to invoke codex exec from inside the parent agent's shell tool. The parent describes the task ("research X and tell me the answer"), the agent runs codex exec as a normal shell command, and the parent receives the sub-agent's last message as the shell tool's output. This pattern works on every Codex install without any feature flags.
Inside a session, ask the agent to delegate
I want you to research three potential ways to migrate src/db.py to async,
in parallel. Spawn three sub-agents via `codex exec`, each with a different
candidate approach, then summarise the trade-offs.
Output (inline in TUI):
[agent] Running: codex exec --output-last-message --sandbox read-only --ask-for-approval never "Research approach 1: asyncpg" &
[agent] Running: codex exec --output-last-message --sandbox read-only --ask-for-approval never "Research approach 2: SQLAlchemy 2.0 asyncio" &
[agent] Running: codex exec --output-last-message --sandbox read-only --ask-for-approval never "Research approach 3: Tortoise ORM" &
[agent] Wait for all three. Summarise.
[agent] Done. Trade-off summary:
- asyncpg: thinnest layer, max perf, no ORM
- SQLAlchemy 2.0: drop-in async, schema reuse, mature
- Tortoise: pythonic, less mature, fewer integrations
Capture the sub-agent's exit code
result=$(codex exec --output-last-message --timeout 60 \
--sandbox read-only --ask-for-approval never \
"Is the README in this repo up to date with the actual API?")
echo "exit=$? reply=$result"
Output:
exit=0 reply=No — three new endpoints in src/api/v2.py are not documented.
Parent prompt to ask for delegation
The parent agent needs to know the delegation pattern. Encode it in AGENTS.md:
## Delegation
For large multi-file investigations, spawn sub-agents with:
codex exec --output-last-message --sandbox read-only --ask-for-approval never "<prompt>"
Each call is isolated. Use it to keep this session's context small.
Output: (none — Markdown for AGENTS.md)
Approach 3 — MCP-mediated delegation
When Codex is launched as an MCP server (codex mcp serve), any other MCP client — including another Codex instance — can call it as a tool. This is the most structured sub-agent pattern: the call/response is governed by the MCP protocol, with typed inputs and outputs.
Launch the inner agent as an MCP server
codex mcp serve
Output:
Codex MCP server listening on stdio
Exposed tools:
codex_exec(prompt: string, sandbox: string, timeout: int) -> string
codex_resume(session_id: string) -> string
codex_apply(task_id: string) -> string
Register it as a tool in the parent config
[mcp_servers.inner-codex]
command = "codex"
args = ["mcp", "serve"]
default_tools_approval_mode = "manual" # prompt before each delegation
Output: (none — TOML config)
Parent agent calls the inner agent
From inside the parent's TUI session, just describe the task — Codex will see codex_exec as a tool and call it when delegation makes sense.
Delegate to the inner codex MCP: have it audit src/db.py for SQL injection.
Output (inline in TUI):
[agent] Calling inner-codex.codex_exec(prompt="Audit src/db.py for SQL injection. Return JSON.", sandbox="read-only", timeout=120)
[agent] Result: [{"file":"src/db.py","line":17,"issue":"unparameterised query"}]
Isolation boundaries
Each sub-agent approach gives different guarantees. Use this matrix to choose.
| Boundary | /agent (experimental) | Recursive codex exec | MCP codex_exec |
|---|---|---|---|
| Separate session/history | yes | yes | yes |
| Separate context window | yes | yes | yes |
| Separate sandbox policy | yes (per-agent template) | yes (per-invocation flag) | yes (per-call arg) |
| Separate model | yes | yes | yes |
Separate AGENTS.md scope | inherits | inherits unless --cd | inherits |
| Separate MCP servers | configurable | inherits parent config | sub-agent process has its own |
| Parent sees structured output | yes (JSON) | yes (stdout) | yes (typed) |
| Parent can interrupt | yes (/agent stop) | yes (Ctrl+C → SIGINT) | partial (timeout only) |
| Available without feature flag | no | yes | yes |
Parallelism patterns
A single Codex session can spawn many sub-agents at once. Three parallelism patterns appear in practice.
Fan-out, fan-in
Spawn N sub-agents in parallel, wait for all, summarise.
# Inside the shell tool of a parent agent
( codex exec --output-last-message --sandbox read-only --ask-for-approval never "Audit src/auth.py" > /tmp/a.txt ) &
( codex exec --output-last-message --sandbox read-only --ask-for-approval never "Audit src/db.py" > /tmp/b.txt ) &
( codex exec --output-last-message --sandbox read-only --ask-for-approval never "Audit src/api.py" > /tmp/c.txt ) &
wait
cat /tmp/a.txt /tmp/b.txt /tmp/c.txt
Output:
[auth.py findings]
[db.py findings]
[api.py findings]
Pipeline
Each stage's output feeds the next stage's prompt.
# Inside the shell tool of a parent agent
plan=$(codex exec --output-last-message --sandbox read-only --ask-for-approval never \
"Outline a refactor of src/auth.py to use JWT cookies. Output: bullets only.")
codex exec --full-auto "Execute this plan: $plan"
Output:
[agent edits src/auth.py per the outlined plan]
Map-reduce
Map a prompt over many files, then reduce.
# Inside the shell tool of a parent agent
for f in src/**/*.py; do
( codex exec --output-last-message --sandbox read-only --ask-for-approval never \
"One-sentence summary of $f" > "/tmp/$(basename $f).sum" ) &
done
wait
cat /tmp/*.sum | codex exec --output-last-message \
"Group these one-sentence summaries by domain and rank importance."
Output:
Auth domain:
- src/auth.py — token issuance and validation
- src/middleware.py — request authentication
DB domain:
- src/db.py — pool config and session helpers
…
Sub-agent prompts that work
A sub-agent does not see the parent's chat history. Its only context is the prompt you give it plus any AGENTS.md it auto-discovers from its cwd. Write the prompt as if you were briefing a new contractor — explicit goal, explicit output format, explicit constraints.
Template
Goal: <one sentence>.
Inputs: <files / data the sub-agent should look at>.
Constraints: <sandbox / time / scope>.
Output: <exact format — JSON, bullets, diff, etc.>.
Example — security review sub-agent
Goal: Find SQL-injection or auth-bypass vulnerabilities in src/auth.py.
Inputs: src/auth.py and any file it imports.
Constraints: Read-only. Do not edit. Time-box to 60 seconds.
Output: JSON array of {file, line, severity, message}.
Example — fixer sub-agent
Goal: Apply the smallest possible patch to make tests/test_auth.py pass.
Inputs: tests/test_auth.py + the failing test output below.
Constraints: workspace-write sandbox; do not modify any test file.
Output: After applying, print the unified diff of changes you made.
Limits and unsupported behaviour
Some things that sub-agents in Codex cannot do today. Many of these are deliberate; some are pending feature work.
- No shared memory. Two sibling sub-agents cannot pass state to each other except via the filesystem.
- No streaming results into the parent agent's prompt. Sub-agent output is appended to the parent's tool-result stream only after the sub-agent completes. The parent does not see partial output.
- No nested
/agentcalls. A sub-agent spawned via/agentcannot itself spawn another via/agent(the feature flag is per-process). Use recursivecodex execfor arbitrary depth. /agentcannot target the OpenAI cloud. Sub-agents always run locally. Usecodex cloudseparately if you need remote execution.- Approval prompts in sub-agents are silent in non-interactive contexts. A sub-agent spawned in
codex execwith--ask-for-approval on-requestwill fail-closed because there is no tty to prompt. /agent stopis best-effort. A sub-agent currently inside a long shell command will not be interrupted until that command returns.
Cost and token accounting
Each sub-agent has its own conversation and consumes its own tokens. The parent sees only the sub-agent's final message in its tool-result stream — but the sub-agent's input + output tokens are billed to the same OPENAI account.
Inspect sub-agent token usage:
codex sessions show th_01xyz --json | jq '.tokens'
Output:
{"input": 18432, "output": 4210, "cache_hits": 7, "cache_misses": 12}
Aggregate across a fan-out:
for s in th_01xyz th_01abc th_01def; do
codex sessions show "$s" --json | jq '.tokens'
done | jq -s 'reduce .[] as $t ({}; .input += $t.input | .output += $t.output)'
Output:
{"input": 54231, "output": 13412}
Comparison with Claude Code's Task tool
Codex's sub-agent surface is composable; Claude Code's is a single first-class tool. Each has trade-offs.
| Capability | Codex | Claude Code |
|---|---|---|
| First-class API | partial (/agent, experimental) | yes (Task tool, stable) |
| Output schema | JSON via prompt | typed |
| Parallel sub-agents | yes (shell & or /agent --bg) | yes (single tool can fan out) |
| Sandbox per sub-agent | yes | inherits |
| Recursive sub-agents | yes (via codex exec) | no (Task can't call Task) |
| Cancellation | partial | full (parent receives result-or-cancel) |
| Available without flag | yes (via codex exec) | yes (Task is always on) |
The trade-off: Claude's Task is more ergonomic for one-off delegation; Codex's codex exec pattern is more flexible for pipelines and fan-out because it composes with any shell command.
Common pitfalls
-
A sub-agent does NOT inherit the parent's chat history. It only sees its prompt + AGENTS.md. If your prompt references "the bug we discussed earlier," the sub-agent has no idea what you mean.
-
A sub-agent's sandbox is independent of the parent's. A
read-onlyparent can spawn aworkspace-writesub-agent (and vice versa). Make sure you're explicit; the default is whateverconfig.tomlsays, not "inherit from parent." -
/agent <name> <task>is blocking by default. Long-running agents will freeze your TUI prompt. Use--bgand/agent list//agent wait. -
Recursive
codex execinheritsOPENAI_API_KEYfrom the parent process. Different keys per sub-agent require explicit env-var management (env OPENAI_API_KEY=... codex exec ...). -
Sub-agent failures don't bubble structured errors. The parent sees the sub-agent's stdout as a string. Use
--output-format jsonand have the parent parse the result. -
Spawned MCP server sub-agents share the parent's MCP server processes — and conflict. If both parent and sub-agent declare
[mcp_servers.filesystem], the sub-agent will fail to bind. Use distinct MCP server names or run sub-agent withCODEX_HOMEpointed at a separate config. -
Background sub-agents do NOT save to history by default. Pass
--saveto the innercodex execif you want to resume them later. -
/agenttemplates ignore the parent's profile. They use their own model + sandbox unless explicitly inheriting. Test before relying on overrides. -
A sub-agent that needs network in
workspace-writesandbox fails silently. It can'tpip installor hit external APIs without--allow-networkordanger-full-access. -
Parent timeouts do NOT propagate to sub-agents. If the parent has a 60s
--timeout, a sub-agent it spawns has the sub-agent's own timeout (or none). Set explicitly.
Real-world recipes
Two-agent review-and-fix loop
A read-only reviewer agent produces JSON findings; a workspace-write fixer agent applies the fixes. Each runs with the minimum permissions it needs.
findings=$(codex exec --output-last-message --sandbox read-only --ask-for-approval never \
"Find correctness bugs in src/auth.py. Output JSON array of {line, message}.")
echo "$findings" \
| codex exec --full-auto \
"Apply minimal fixes for these findings: $findings. Touch only src/auth.py."
Output:
[reviewer JSON findings]
[fixer applies the patches]
Triage 50 stale PRs in parallel
gh pr list --limit 50 --json number,title --jq '.[].number' \
| xargs -I{} -P 8 bash -c '
gh pr diff {} | codex exec --output-last-message --sandbox read-only --ask-for-approval never \
"Classify this PR as: ready / needs-work / abandon. One line only."
'
Output:
[50 one-line classifications, 8 at a time]
Long-running explorer with progress reporting
A background sub-agent investigates a complex topic; the parent polls and surfaces interim results.
/agent explorer --bg "Map all the auth flows in this repo. Write progress to /tmp/explorer.log."
/agent wait
Output (inline in TUI):
[explorer:th_01xyz] Done. See /tmp/explorer.log.
Self-hosted MCP delegation pyramid
Layered Codex instances: a coordinator at the top, two specialised reviewers underneath. The coordinator dispatches by topic.
# coordinator's ~/.codex/config.toml
[mcp_servers.security-reviewer]
command = "codex"
args = ["mcp", "serve"]
env = { CODEX_HOME = "/srv/codex-security" }
[mcp_servers.perf-reviewer]
command = "codex"
args = ["mcp", "serve"]
env = { CODEX_HOME = "/srv/codex-perf" }
Output: (none — TOML config)
The coordinator session sees both reviewers as MCP tools and routes tasks accordingly.
Cap sub-agent cost with --max-turns
A sub-agent that loops forever drains tokens. Always cap.
codex exec --max-turns 6 --timeout 60 --output-last-message --sandbox read-only --ask-for-approval never \
"Is the README up to date?"
Output:
[answer in at most 6 turns / 60 seconds]
Drop a sub-agent into a scratch dir
Isolate a sub-agent into a temporary directory so its filesystem effects are easy to inspect (or throw away).
work=$(mktemp -d)
cp src/auth.py "$work/"
codex exec --cd "$work" --skip-git-repo-check --full-auto \
"Rewrite auth.py to use JWT and write a CHANGES.md describing the rewrite."
diff -ru src/auth.py "$work/auth.py"
Output:
[diff between original and sub-agent's rewrite]
Two-pass refactor: planner then executor
plan=$(codex exec --output-last-message --sandbox read-only --ask-for-approval never -p deep \
"Plan a refactor of src/auth/ to use JWT cookies. Output: numbered steps.")
codex exec --full-auto -p sprint "Execute this plan exactly: $plan"
Output:
[plan from deep-thinking model]
[execution by faster model]
Cancellable streaming wrapper
A small bash function that spawns a sub-agent in the background and forwards Ctrl+C to it:
function cx-bg() {
codex exec --json "$@" &
pid=$!
trap "kill -INT $pid" INT
wait "$pid"
trap - INT
}
Output: (none — defines wrapper)
cx-bg "Long investigation task"
Output: (NDJSON stream; Ctrl+C cancels cleanly)
Use a sub-agent as a Q&A oracle inside a script
function ask() {
codex exec --output-last-message --sandbox read-only --ask-for-approval never --timeout 30 "$@"
}
echo "Default branch: $(ask 'What is the default branch of this repo?')"
echo "Top language: $(ask 'What is the most common language in this repo?')"
Output:
Default branch: main
Top language: TypeScript