multi-agent diff

multi-agent.md +200 −18

Details

4 4

5With multi-agent workflows you can also define your own set of agents with different model configurations and instructions depending on the agent.5With multi-agent workflows you can also define your own set of agents with different model configurations and instructions depending on the agent.

6 6

7For the concepts and tradeoffs behind multi-agent workflows (including context pollution/context rot and model-selection guidance), see [Multi-agents concepts](https://developers.openai.com/codex/concepts/multi-agents).

7## Enable multi-agent9## Enable multi-agent

8 10

9Multi-agent workflows are currently experimental and need to be explicitly enabled.11Multi-agent workflows are currently experimental and need to be explicitly enabled.

29 31

30Codex will automatically decide when to spawn a new agent or you can explicitly ask it to do so.32Codex will automatically decide when to spawn a new agent or you can explicitly ask it to do so.

31 33

34For long-running commands or polling workflows, Codex can also use the built-in `monitor` role, which is tuned for waiting and repeated status checks.

32To see it in action, try the following prompt on your project:36To see it in action, try the following prompt on your project:

33 37

34```38```

45 49

46- Use `/agent` in the CLI to switch between active agent threads and inspect the ongoing thread.50- Use `/agent` in the CLI to switch between active agent threads and inspect the ongoing thread.

47- Ask Codex directly to steer a running sub-agent, stop it, or close completed agent threads.51- Ask Codex directly to steer a running sub-agent, stop it, or close completed agent threads.

52- The `wait` tool supports long polling windows for monitoring workflows (up to 1 hour per call).

54## Process CSV batches with sub-agents

56Use `spawn_agents_on_csv` when you have many similar tasks that can be expressed as one row per work item. Codex reads the CSV, spawns one worker sub-agent per row, waits for the full batch to finish, and exports the combined results to CSV.

58This works well for repeated audits such as:

60- reviewing one file, package, or service per row

61- checking a list of incidents, PRs, or migration targets

62- generating structured summaries for many similar inputs

64The tool accepts:

66- `csv_path` for the source CSV

67- `instruction` for the worker prompt template, using `{column_name}` placeholders

68- `id_column` when you want stable item ids from a specific column

69- `output_schema` when each worker should return a JSON object with a fixed shape

70- `output_csv_path`, `max_concurrency`, and `max_runtime_seconds` for job control

72Each worker must call `report_agent_job_result` exactly once. If a worker exits without reporting a result, that row is marked as failed in the exported CSV.

74Example prompt:

76```

77Create /tmp/components.csv with columns path,owner and one row per frontend component.

79Then call spawn_agents_on_csv with:

80- csv_path: /tmp/components.csv

81- id_column: path

82- instruction: "Review {path} owned by {owner}. Return JSON with keys path, risk, summary, and follow_up via report_agent_job_result."

83- output_csv_path: /tmp/components-review.csv

84- output_schema: an object with required string fields path, risk, summary, and follow_up

85```

87When you run this through `codex exec`, Codex shows a single-line progress update on `stderr` while the batch is running. The exported CSV includes the original row data plus metadata such as `job_id`, `item_id`, `status`, `last_error`, and `result_json`.

89Related runtime settings:

91- `agents.max_threads` caps how many agent threads can stay open concurrently.

92- `agents.job_max_runtime_seconds` sets the default per-worker timeout for CSV fan-out jobs. A per-call `max_runtime_seconds` override takes precedence.

93- `sqlite_home` controls where Codex stores the SQLite-backed state used for agent jobs and their exported results.

48 94

49## Approvals and sandbox controls95## Approvals and sandbox controls

50 96

~~51Sub-agents inherit your current sandbox policy, but they run with~~97Sub-agents inherit your current sandbox policy.

~~52non-interactive approvals. If a sub-agent attempts an action that would require~~98

~~53a new approval, that action fails and the error is surfaced in the parent~~99In interactive CLI sessions, approval requests can surface from inactive agent

~~54workflow.~~100threads even while you are looking at the main thread. The approval overlay

101shows the source thread label, and you can press `o` to open that thread before

102you approve, reject, or answer the request.

103

104In non-interactive flows, or whenever a run cannot surface a fresh approval,

105an action that needs new approval fails and the error is surfaced back to the

106parent workflow.

107

108Codex also reapplies the parent turn’s live runtime overrides when it spawns a

109child. That includes sandbox and approval choices you set interactively during

110the session, such as `/approvals` changes or `--yolo`, even if the selected

111agent role loads a config file with different defaults.

55 112

56You can also override the sandbox configuration for individual [agent roles](#agent-roles) such as explicitly marking an agent to work in read-only mode.113You can also override the sandbox configuration for individual [agent roles](#agent-roles) such as explicitly marking an agent to work in read-only mode.

57 114

66 123

67Codex ships with built-in roles:124Codex ships with built-in roles:

68 125

~~69- `default`~~126- `default`: general-purpose fallback role.

~~70- `worker`~~127- `worker`: execution-focused role for implementation and fixes.

~~71- `explorer`~~128- `explorer`: read-heavy codebase exploration role.

129- `monitor`: long-running command/task monitoring role (optimized for waiting/polling).

72 130

73Each agent role can override your default configuration. Common settings to override for an agent role are:131Each agent role can override your default configuration. Common settings to override for an agent role are:

74 132

82| --- | --- | --- | --- |140| --- | --- | --- | --- |

88**Notes:**148**Notes:**

89 149

90- Unknown fields in `[agents.<name>]` are rejected.150- Unknown fields in `[agents.<name>]` are rejected.

151- `agents.max_depth` defaults to `1`, which allows a direct child agent to spawn but prevents deeper nesting.

152- `agents.job_max_runtime_seconds` is optional. When you leave it unset, `spawn_agents_on_csv` falls back to its per-call default timeout of 1800 seconds per worker.

91- Relative `config_file` paths are resolved relative to the `config.toml` file that defines the role.153- Relative `config_file` paths are resolved relative to the `config.toml` file that defines the role.

154- `agents.<name>.config_file` is validated at config load time and must point to an existing file.

92- If a role name matches a built-in role (for example, `explorer`), your user-defined role takes precedence.155- If a role name matches a built-in role (for example, `explorer`), your user-defined role takes precedence.

93- If Codex can’t load a role config file, agent spawns can fail until you fix the file.156- If Codex can’t load a role config file, agent spawns can fail until you fix the file.

94- Any configuration not set by the agent role will be inherited from the parent session.157- Any configuration not set by the agent role will be inherited from the parent session.

95 158

96### Example agent roles159### Example agent roles

97 160

~~98Below is an example that overrides the definitions for the built-in `default` and `explorer` agent roles and defines a new `reviewer` role.~~161The best role definitions are narrow and opinionated. Give each role one clear job, a tool surface that matches that job, and instructions that keep it from drifting into adjacent work.

162

163#### Example 1: PR review team

164

165This pattern splits review into three focused roles:

99 166

100Example `~/.codex/config.toml`:167- `explorer` maps the codebase and gathers evidence.

168- `reviewer` looks for correctness, security, and test risks.

169- `docs_researcher` checks framework or API documentation through a dedicated MCP server.

170

171Project config (`.codex/config.toml`):

101 172

102```173```

103[agents.default]174[agents]

104description = "General-purpose helper."175max_threads = 6

176max_depth = 1

177

178[agents.explorer]

179description = "Read-only codebase explorer for gathering evidence before changes are proposed."

180config_file = "agents/explorer.toml"

105 181

106[agents.reviewer]182[agents.reviewer]

107description = "Find security, correctness, and test risks in code."183description = "PR reviewer focused on correctness, security, and missing tests."

108config_file = "agents/reviewer.toml"184config_file = "agents/reviewer.toml"

109 185

110[agents.explorer]186[agents.docs_researcher]

111description = "Fast codebase explorer for read-heavy tasks."187description = "Documentation specialist that uses the docs MCP server to verify APIs and framework behavior."

112config_file = "agents/custom-explorer.toml"188config_file = "agents/docs-researcher.toml"

113```189```

114 190

115Example config file for the `reviewer` role (`~/.codex/agents/reviewer.toml`):191`agents/explorer.toml`:

192

193```

194model = "gpt-5.3-codex-spark"

195model_reasoning_effort = "medium"

196sandbox_mode = "read-only"

197developer_instructions = """

198Stay in exploration mode.

199Trace the real execution path, cite files and symbols, and avoid proposing fixes unless the parent agent asks for them.

200Prefer fast search and targeted file reads over broad scans.

201"""

202```

203

204`agents/reviewer.toml`:

116 205

117```206```

118model = "gpt-5.3-codex"207model = "gpt-5.3-codex"

119model_reasoning_effort = "high"208model_reasoning_effort = "high"

120developer_instructions = "Focus on high priority issues, write tests to validate hypothesis before flagging an issue. When finding security issues give concrete steps on how to reproduce the vulnerability."209sandbox_mode = "read-only"

210developer_instructions = """

211Review code like an owner.

212Prioritize correctness, security, behavior regressions, and missing test coverage.

213Lead with concrete findings, include reproduction steps when possible, and avoid style-only comments unless they hide a real bug.

214"""

215```

216

217`agents/docs-researcher.toml`:

218

219```

220model = "gpt-5.3-codex-spark"

221model_reasoning_effort = "medium"

222sandbox_mode = "read-only"

223developer_instructions = """

224Use the docs MCP server to confirm APIs, options, and version-specific behavior.

225Return concise answers with links or exact references when available.

226Do not make code changes.

227"""

228

229[mcp_servers.openaiDeveloperDocs]

230url = "https://developers.openai.com/mcp"

231```

232

233This setup works well for prompts like:

234

235```

236Review this branch against main. Have explorer map the affected code paths, reviewer find real risks, and docs_researcher verify the framework APIs that the patch relies on.

121```237```

122 238

123Example config file for the `explorer` role (`~/.codex/agents/custom-explorer.toml`):239#### Example 2: frontend integration debugging team

240

241This pattern is useful for UI regressions, flaky browser flows, or integration bugs that cross application code and the running product.

242

243Project config (`.codex/config.toml`):

244

245```

246[agents]

247max_threads = 6

248max_depth = 1

249

250[agents.explorer]

251description = "Read-only codebase explorer for locating the relevant frontend and backend code paths."

252config_file = "agents/explorer.toml"

253

254[agents.browser_debugger]

255description = "UI debugger that uses browser tooling to reproduce issues and capture evidence."

256config_file = "agents/browser-debugger.toml"

257

258[agents.worker]

259description = "Implementation-focused agent for small, targeted fixes after the issue is understood."

260config_file = "agents/worker.toml"

261```

262

263`agents/explorer.toml`:

124 264

125```265```

126model = "gpt-5.3-codex-spark"266model = "gpt-5.3-codex-spark"

127model_reasoning_effort = "medium"267model_reasoning_effort = "medium"

128sandbox_mode = "read-only"268sandbox_mode = "read-only"

269developer_instructions = """

270Map the code that owns the failing UI flow.

271Identify entry points, state transitions, and likely files before the worker starts editing.

272"""

273```

274

275`agents/browser-debugger.toml`:

276

277```

278model = "gpt-5.3-codex"

279model_reasoning_effort = "high"

280sandbox_mode = "workspace-write"

281developer_instructions = """

282Reproduce the issue in the browser, capture exact steps, and report what the UI actually does.

283Use browser tooling for screenshots, console output, and network evidence.

284Do not edit application code.

285"""

286

287[mcp_servers.chrome_devtools]

288url = "http://localhost:3000/mcp"

289startup_timeout_sec = 20

290```

291

292`agents/worker.toml`:

293

294```

295model = "gpt-5.3-codex"

296model_reasoning_effort = "medium"

297developer_instructions = """

298Own the fix once the issue is reproduced.

299Make the smallest defensible change, keep unrelated files untouched, and validate only the behavior you changed.

300"""

301

302[[skills.config]]

303path = "/Users/me/.agents/skills/docs-editor/SKILL.md"

304enabled = false

305```

306

307This setup works well for prompts like:

308

309```

310Investigate why the settings modal fails to save. Have browser_debugger reproduce it, explorer trace the responsible code path, and worker implement the smallest fix once the failure mode is clear.

129```311```

multi-agent.md Codex Docs, 2026-02-19 20:53 UTC → 2026-03-04 18:18 UTC