concepts/sandboxing/auto-review.md +165 −0 added
1# Auto-review
2
3Auto-review replaces manual approval at the sandbox boundary with a separate
4reviewer agent. The main Codex agent still runs inside the same sandbox, with
5the same approval policy and the same network and filesystem limits. The
6difference is who reviews eligible escalation requests.
7
8Auto-review only applies when approvals are interactive. In practice, that
9 means `approval_policy = "on-request"` or a granular approval policy that
10 still surfaces the relevant prompt category. With `approval_policy = "never"`,
11 there is nothing to review.
12
13## How auto-review works
14
15At a high level, the flow is:
16
171. The main agent works inside `read-only` or `workspace-write`.
182. When it needs to cross the sandbox boundary, it requests approval.
193. If `approvals_reviewer = "auto_review"`, Codex routes that approval request
20 to a separate reviewer agent instead of stopping for a person.
214. The reviewer decides whether the action should run and returns a rationale.
225. If the action is approved, execution continues. If it is denied, the main
23 agent is instructed to find a materially safer path or stop and ask the
24 user.
25
26Auto-review is a reviewer swap, not a permission grant. It does not expand
27`writable_roots`, enable network access, or weaken protected paths. It only
28changes how Codex handles actions that already need approval.
29
30## When it triggers
31
32Auto-review evaluates approval requests that would otherwise pause for a human.
33These include:
34
35- Shell or exec tool calls that request escalated sandbox permissions.
36- Network requests blocked by the current sandbox or policy.
37- File edits outside the allowed writable roots.
38- MCP or app tool calls that require approval based on their tool annotations
39 or configured approval mode.
40- Browser Use access to a new website or domain.
41
42Auto-review does not run for routine actions already allowed inside the
43sandbox. If a command can run under the active `sandbox_mode`, or a tool call
44stays within the allowed policy, the main agent continues without review.
45
46Computer Use is a separate case. App approvals for Computer Use still surface
47directly to the user, so Auto-review does not replace those app-level prompts.
48
49## What auto-review blocks
50
51At a high level, Auto-review is designed to block actions such as:
52
53- sending private data, secrets, or credentials to untrusted destinations
54- probing for credentials, tokens, cookies, or session material
55- broad or persistent security weakening
56- destructive actions with significant risk of irreversible damage
57
58The exact policy lives in the open-source Codex repository:
59[policy_template.md](https://github.com/openai/codex/blob/main/codex-rs/core/src/guardian/policy_template.md)
60and
61[policy.md](https://github.com/openai/codex/blob/main/codex-rs/core/src/guardian/policy.md).
62That policy can be customized per enterprise with `guardian_policy_config` or
63per user with local [`[auto_review].policy`](/codex/config-advanced#approval-policies-and-sandbox-modes).
64
65## What the reviewer sees
66
67The reviewer is itself a Codex agent with a narrower job than the main agent:
68decide whether a specific boundary-crossing action should run.
69
70The reviewer sees a compact transcript plus the exact approval request. That
71typically includes user messages, surfaced assistant updates, relevant tool
72calls and tool outputs, and the action now being proposed for approval. It can
73also perform read-only checks to gather missing context, but it does so rarely.
74
75Hidden assistant reasoning is not included. Auto-review sees retained
76conversation items and tool evidence, not private chain-of-thought.
77
78## Denials and failure behavior
79
80An explicit denial is not treated like an ordinary sandbox error. Codex returns
81the review rationale to the main agent and adds a stronger instruction:
82
83- Do not pursue the same outcome via workaround, indirect execution, or policy
84 circumvention.
85- Continue only with a materially safer alternative.
86- Otherwise, stop and ask the user.
87
88Codex also applies a rejection circuit breaker per turn. In the current
89open-source implementation, Auto-review interrupts the turn after `3`
90consecutive denials or `10` denials within a rolling window of the last `50`
91reviews in the same turn.
92
93Any non-denial resets the consecutive-denial counter. When the breaker trips,
94Codex emits a warning and aborts the current turn with an interrupt rather than
95letting the agent loop on more escalation attempts.
96
97Timeouts are surfaced separately from explicit denials, and the main agent is
98informed that a timeout alone is not proof that the action is unsafe.
99
100There is also an explicit override path for denied actions. In the current
101open-source TUI, run `/approve` to open the **Auto-review Denials** picker, then
102select one recent denied action to approve for one retry. Codex records up to 10
103recent denials per thread. That approval is narrow: it applies to the exact
104denied action, not similar future actions; it is recorded for one retry in the
105same context; and the retry still goes through Auto-review. Under the hood,
106Codex injects a developer-scoped approval marker for that exact action. The
107reviewer then sees that explicit user override as context, but it still follows
108policy and can deny again if policy says the user cannot overwrite that class of
109denial.
110
111## Configuration
112
113For setup details, see
114[Managed configuration](https://developers.openai.com/codex/enterprise/managed-configuration#configure-automatic-review-policy).
115
116The default reviewer policy is in the open-source Codex repository:
117[core/src/guardian/policy.md](https://github.com/openai/codex/blob/main/codex-rs/core/src/guardian/policy.md).
118Enterprises can replace its tenant-specific section with
119`guardian_policy_config` in managed requirements. Individual users can also set
120a local
121[`[auto_review].policy`](/codex/config-advanced#approval-policies-and-sandbox-modes)
122in their `config.toml`, but managed requirements take precedence:
123
124```toml
125[auto_review]
126policy = """
127YOUR POLICY GOES HERE
128"""
129```
130
131To customize the policy, copy the whole default policy wording first, then
132iterate based on your individual risk profile.
133
134## Reduce review volume without weakening security
135
136Auto-review works best when the sandbox already covers your common safe
137workflows. If too many mundane actions need review, fix the boundary first
138instead of teaching the reviewer to approve noisy escalations forever.
139
140In practice, the highest-leverage changes are:
141
142- Add narrow
143 [`writable_roots`](https://developers.openai.com/codex/config-advanced#approval-policies-and-sandbox-modes)
144 for scratch directories or neighboring repos you intentionally use.
145- Add narrowly scoped [prefix rules](https://developers.openai.com/codex/rules). Prefer precise command
146 prefixes such as `["cargo", "test"]` or `["pnpm", "run", "lint"]` over broad
147 patterns such as `["python"]` or `["curl"]`. Broad rules often erase the very
148 boundary Auto-review is meant to guard.
149
150Auto-review session transcripts are retained under `~/.codex/sessions` by
151default, so you can ask Codex to analyze past traffic there before changing
152policy or permissions.
153
154## Limits
155
156Auto-review improves the default operating point for long-running agentic work,
157but it is not a deterministic security guarantee.
158
159- It only evaluates actions that ask to cross a boundary.
160- It can still make mistakes, especially in adversarial or unusual contexts.
161- It should complement, not replace, good sandbox design, monitoring, and
162 organization-specific policy.
163
164For the research rationale and published evaluation results, see the
165[Alignment Research post on Auto-review](https://alignment.openai.com/auto-review/).