concepts/sandboxing/auto-review diff

concepts/sandboxing/auto-review.md +165 −0 added

Details

1# Auto-review

3Auto-review replaces manual approval at the sandbox boundary with a separate

4reviewer agent. The main Codex agent still runs inside the same sandbox, with

5the same approval policy and the same network and filesystem limits. The

6difference is who reviews eligible escalation requests.

8Auto-review only applies when approvals are interactive. In practice, that

9 means `approval_policy = "on-request"` or a granular approval policy that

10 still surfaces the relevant prompt category. With `approval_policy = "never"`,

11 there is nothing to review.

13## How auto-review works

15At a high level, the flow is:

171. The main agent works inside `read-only` or `workspace-write`.

182. When it needs to cross the sandbox boundary, it requests approval.

193. If `approvals_reviewer = "auto_review"`, Codex routes that approval request

20 to a separate reviewer agent instead of stopping for a person.

214. The reviewer decides whether the action should run and returns a rationale.

225. If the action is approved, execution continues. If it is denied, the main

23 agent is instructed to find a materially safer path or stop and ask the

24 user.

26Auto-review is a reviewer swap, not a permission grant. It does not expand

27`writable_roots`, enable network access, or weaken protected paths. It only

28changes how Codex handles actions that already need approval.

30## When it triggers

32Auto-review evaluates approval requests that would otherwise pause for a human.

33These include:

35- Shell or exec tool calls that request escalated sandbox permissions.

36- Network requests blocked by the current sandbox or policy.

37- File edits outside the allowed writable roots.

38- MCP or app tool calls that require approval based on their tool annotations

39 or configured approval mode.

40- Browser Use access to a new website or domain.

42Auto-review does not run for routine actions already allowed inside the

43sandbox. If a command can run under the active `sandbox_mode`, or a tool call

44stays within the allowed policy, the main agent continues without review.

46Computer Use is a separate case. App approvals for Computer Use still surface

47directly to the user, so Auto-review does not replace those app-level prompts.

49## What auto-review blocks

51At a high level, Auto-review is designed to block actions such as:

53- sending private data, secrets, or credentials to untrusted destinations

54- probing for credentials, tokens, cookies, or session material

55- broad or persistent security weakening

56- destructive actions with significant risk of irreversible damage

58The exact policy lives in the open-source Codex repository:

59[policy_template.md](https://github.com/openai/codex/blob/main/codex-rs/core/src/guardian/policy_template.md)

60and

61[policy.md](https://github.com/openai/codex/blob/main/codex-rs/core/src/guardian/policy.md).

62That policy can be customized per enterprise with `guardian_policy_config` or

63per user with local [`[auto_review].policy`](/codex/config-advanced#approval-policies-and-sandbox-modes).

65## What the reviewer sees

67The reviewer is itself a Codex agent with a narrower job than the main agent:

68decide whether a specific boundary-crossing action should run.

70The reviewer sees a compact transcript plus the exact approval request. That

71typically includes user messages, surfaced assistant updates, relevant tool

72calls and tool outputs, and the action now being proposed for approval. It can

73also perform read-only checks to gather missing context, but it does so rarely.

75Hidden assistant reasoning is not included. Auto-review sees retained

76conversation items and tool evidence, not private chain-of-thought.

78## Denials and failure behavior

80An explicit denial is not treated like an ordinary sandbox error. Codex returns

81the review rationale to the main agent and adds a stronger instruction:

83- Do not pursue the same outcome via workaround, indirect execution, or policy

84 circumvention.

85- Continue only with a materially safer alternative.

86- Otherwise, stop and ask the user.

88Codex also applies a rejection circuit breaker per turn. In the current

89open-source implementation, Auto-review interrupts the turn after `3`

90consecutive denials or `10` denials within a rolling window of the last `50`

91reviews in the same turn.

93Any non-denial resets the consecutive-denial counter. When the breaker trips,

94Codex emits a warning and aborts the current turn with an interrupt rather than

95letting the agent loop on more escalation attempts.

97Timeouts are surfaced separately from explicit denials, and the main agent is

98informed that a timeout alone is not proof that the action is unsafe.

100There is also an explicit override path for denied actions. In the current

101open-source TUI, run `/approve` to open the **Auto-review Denials** picker, then

102select one recent denied action to approve for one retry. Codex records up to 10

103recent denials per thread. That approval is narrow: it applies to the exact

104denied action, not similar future actions; it is recorded for one retry in the

105same context; and the retry still goes through Auto-review. Under the hood,

106Codex injects a developer-scoped approval marker for that exact action. The

107reviewer then sees that explicit user override as context, but it still follows

108policy and can deny again if policy says the user cannot overwrite that class of

109denial.

110

111## Configuration

112

113For setup details, see

114[Managed configuration](https://developers.openai.com/codex/enterprise/managed-configuration#configure-automatic-review-policy).

115

116The default reviewer policy is in the open-source Codex repository:

117[core/src/guardian/policy.md](https://github.com/openai/codex/blob/main/codex-rs/core/src/guardian/policy.md).

118Enterprises can replace its tenant-specific section with

119`guardian_policy_config` in managed requirements. Individual users can also set

120a local

121[`[auto_review].policy`](/codex/config-advanced#approval-policies-and-sandbox-modes)

122in their `config.toml`, but managed requirements take precedence:

123

124```toml

125[auto_review]

126policy = """

127YOUR POLICY GOES HERE

128"""

129```

130

131To customize the policy, copy the whole default policy wording first, then

132iterate based on your individual risk profile.

133

134## Reduce review volume without weakening security

135

136Auto-review works best when the sandbox already covers your common safe

137workflows. If too many mundane actions need review, fix the boundary first

138instead of teaching the reviewer to approve noisy escalations forever.

139

140In practice, the highest-leverage changes are:

141

142- Add narrow

143 [`writable_roots`](https://developers.openai.com/codex/config-advanced#approval-policies-and-sandbox-modes)

144 for scratch directories or neighboring repos you intentionally use.

145- Add narrowly scoped [prefix rules](https://developers.openai.com/codex/rules). Prefer precise command

146 prefixes such as `["cargo", "test"]` or `["pnpm", "run", "lint"]` over broad

147 patterns such as `["python"]` or `["curl"]`. Broad rules often erase the very

148 boundary Auto-review is meant to guard.

149

150Auto-review session transcripts are retained under `~/.codex/sessions` by

151default, so you can ask Codex to analyze past traffic there before changing

152policy or permissions.

153

154## Limits

155

156Auto-review improves the default operating point for long-running agentic work,

157but it is not a deterministic security guarantee.

158

159- It only evaluates actions that ask to cross a boundary.

160- It can still make mistakes, especially in adversarial or unusual contexts.

161- It should complement, not replace, good sandbox design, monitoring, and

162 organization-specific policy.

163

164For the research rationale and published evaluation results, see the

165[Alignment Research post on Auto-review](https://alignment.openai.com/auto-review/).

concepts/sandboxing/auto-review.md Codex Docs, 2026-03-07 06:14 UTC → 2026-05-13 00:57 UTC