Documentation — Spybara

guides/agents/define-agents.md +2 −2

Details

41const agent = new Agent({41const agent = new Agent({

42 name: "Weather bot",42 name: "Weather bot",

43 instructions: "You are a helpful weather bot.",43 instructions: "You are a helpful weather bot.",

~~44 model: "gpt-5.4",~~44 model: "gpt-5.5",

45 tools: [getWeather],45 tools: [getWeather],

46});46});

47```47```

59agent = Agent(59agent = Agent(

60 name="Weather bot",60 name="Weather bot",

61 instructions="You are a helpful weather bot.",61 instructions="You are a helpful weather bot.",

~~62 model="gpt-5.4",~~62 model="gpt-5.5",

63 tools=[get_weather],63 tools=[get_weather],

64)64)

65```65```

guides/agents/models.md +3 −3

Details

27});27});

28 28

29const runner = new Runner({29const runner = new Runner({

~~30 model: "gpt-5.4",~~30 model: "gpt-5.5",

31});31});

32 32

33await runner.run(fastAgent, "Summarize ticket 123.");33await runner.run(fastAgent, "Summarize ticket 123.");

62 result = await Runner.run(62 result = await Runner.run(

63 general_agent,63 general_agent,

64 "Investigate the billing issue on account 456.",64 "Investigate the billing issue on account 456.",

~~65 run_config=RunConfig(model="gpt-5.4"),~~65 run_config=RunConfig(model="gpt-5.5"),

66 )66 )

67 print(result.final_output)67 print(result.final_output)

68 68

72```72```

73 73

74 74

75For most new SDK workflows, start with [`gpt-5.4`](https://developers.openai.com/api/docs/models/gpt-5.4) and move to a smaller variant only when latency or cost matters enough to justify it. Use the platform-wide [Using GPT-5.4](https://developers.openai.com/api/docs/guides/latest-model) guide for current model-selection advice.75For most new SDK workflows, start with [`gpt-5.5`](https://developers.openai.com/api/docs/models/gpt-5.5) and move to a smaller variant only when latency or cost matters enough to justify it. Use the platform-wide [Using GPT-5.5](https://developers.openai.com/api/docs/guides/latest-model) guide for current model-selection advice.

76 76

77## Choose the simplest default strategy77## Choose the simplest default strategy

78 78

guides/agents/quickstart.md +2 −2

Details

36 name: "History tutor",36 name: "History tutor",

37 instructions:37 instructions:

38 "You answer history questions clearly and concisely.",38 "You answer history questions clearly and concisely.",

~~39 model: "gpt-5.4",~~39 model: "gpt-5.5",

40});40});

41 41

42const result = await run(agent, "When did the Roman Empire fall?");42const result = await run(agent, "When did the Roman Empire fall?");

51agent = Agent(51agent = Agent(

52 name="History tutor",52 name="History tutor",

53 instructions="You answer history questions clearly and concisely.",53 instructions="You answer history questions clearly and concisely.",

~~54 model="gpt-5.4",~~54 model="gpt-5.5",

55)55)

56 56

57 57

guides/agents/sandboxes.md +1 −1

Details

268 268

269agent = SandboxAgent(269agent = SandboxAgent(

270 name="Renewal Packet Analyst",270 name="Renewal Packet Analyst",

271 model="gpt-5.4",271 model="gpt-5.5",

272 instructions=(272 instructions=(

273 "Review the workspace before answering. Keep the response concise, "273 "Review the workspace before answering. Keep the response concise, "

274 "business-focused, and cite the file names that support each conclusion."274 "business-focused, and cite the file names that support each conclusion."

guides/background.md +8 −8

Details

17-H "Content-Type: application/json" \\17-H "Content-Type: application/json" \\

18-H "Authorization: Bearer $OPENAI_API_KEY" \\18-H "Authorization: Bearer $OPENAI_API_KEY" \\

19-d '{19-d '{

~~20 "model": "gpt-5.4",~~20 "model": "gpt-5.5",

21 "input": "Write a very long novel about otters in space.",21 "input": "Write a very long novel about otters in space.",

22 "background": true22 "background": true

23}'23}'

28const client = new OpenAI();28const client = new OpenAI();

29 29

30const resp = await client.responses.create({30const resp = await client.responses.create({

~~31 model: "gpt-5.4",~~31 model: "gpt-5.5",

32 input: "Write a very long novel about otters in space.",32 input: "Write a very long novel about otters in space.",

33 background: true,33 background: true,

34});34});

42client = OpenAI()42client = OpenAI()

43 43

44resp = client.responses.create(44resp = client.responses.create(

~~45 model="gpt-5.4",~~45 model="gpt-5.5",

46 input="Write a very long novel about otters in space.",46 input="Write a very long novel about otters in space.",

47 background=True,47 background=True,

48)48)

68const client = new OpenAI();68const client = new OpenAI();

69 69

70let resp = await client.responses.create({70let resp = await client.responses.create({

~~71model: "gpt-5.4",~~71model: "gpt-5.5",

72input: "Write a very long novel about otters in space.",72input: "Write a very long novel about otters in space.",

73background: true,73background: true,

74});74});

89client = OpenAI()89client = OpenAI()

90 90

91resp = client.responses.create(91resp = client.responses.create(

~~92 model="gpt-5.4",~~92 model="gpt-5.5",

93 input="Write a very long novel about otters in space.",93 input="Write a very long novel about otters in space.",

94 background=True,94 background=True,

95)95)

151-H "Content-Type: application/json" \\151-H "Content-Type: application/json" \\

152-H "Authorization: Bearer $OPENAI_API_KEY" \\152-H "Authorization: Bearer $OPENAI_API_KEY" \\

153-d '{153-d '{

154 "model": "gpt-5.4",154 "model": "gpt-5.5",

155 "input": "Write a very long novel about otters in space.",155 "input": "Write a very long novel about otters in space.",

156 "background": true,156 "background": true,

157 "stream": true157 "stream": true

168const client = new OpenAI();168const client = new OpenAI();

169 169

170const stream = await client.responses.create({170const stream = await client.responses.create({

171 model: "gpt-5.4",171 model: "gpt-5.5",

172 input: "Write a very long novel about otters in space.",172 input: "Write a very long novel about otters in space.",

173 background: true,173 background: true,

174 stream: true,174 stream: true,

192 192

193# Fire off an async response but also start streaming immediately193# Fire off an async response but also start streaming immediately

194stream = client.responses.create(194stream = client.responses.create(

195 model="gpt-5.4",195 model="gpt-5.5",

196 input="Write a very long novel about otters in space.",196 input="Write a very long novel about otters in space.",

197 background=True,197 background=True,

198 stream=True,198 stream=True,

guides/code-generation.md +8 −8

Details

11 11

12[**Codex**](https://developers.openai.com/codex/overview) is OpenAI's coding agent for software development. It helps you write, review and debug code. Interact with Codex in a variety of interfaces: in your IDE, through the CLI, on web and mobile sites, or in your CI/CD pipelines with the SDK. Codex is the best way to get agentic software engineering on your projects.12[**Codex**](https://developers.openai.com/codex/overview) is OpenAI's coding agent for software development. It helps you write, review and debug code. Interact with Codex in a variety of interfaces: in your IDE, through the CLI, on web and mobile sites, or in your CI/CD pipelines with the SDK. Codex is the best way to get agentic software engineering on your projects.

13 13

14Codex works best with the latest models from the GPT-5 family, such as [`gpt-5.4`](https://developers.openai.com/api/docs/models/gpt-5.4). We offer a range of models specifically designed to work with coding agents like Codex, such as [`gpt-5.3-codex`](https://developers.openai.com/api/docs/models/gpt-5.3-codex), but starting with `gpt-5.4`, we recommend using the general-purpose model for most code generation tasks.14Codex works best with the latest models from the GPT-5 family, such as [`gpt-5.5`](https://developers.openai.com/api/docs/models/gpt-5.5). We offer a range of models specifically designed to work with coding agents like Codex, such as [`gpt-5.3-codex`](https://developers.openai.com/api/docs/models/gpt-5.3-codex), but we recommend using the latest general-purpose model for most code generation tasks.

15 15

16See the [Codex docs](https://developers.openai.com/codex) for setup guides, reference material, pricing, and more information.16See the [Codex docs](https://developers.openai.com/codex) for setup guides, reference material, pricing, and more information.

17 17

18## Integrate with coding models18## Integrate with coding models

19 19

20For most API-based code generation, start with **`gpt-5.4`**. It handles both general-purpose work and coding, which makes it a strong default when your application needs to write code, reason about requirements, inspect docs, and handle broader workflows in one place.20For most API-based code generation, start with **`gpt-5.5`**. It handles both general-purpose work and coding, which makes it a strong default when your application needs to write code, reason about requirements, inspect docs, and handle broader workflows in one place.

21 21

22This example shows how you can use the [Responses API](https://developers.openai.com/api/docs/api-reference/responses) for a code generation use case:22This example shows how you can use the [Responses API](https://developers.openai.com/api/docs/api-reference/responses) for a code generation use case:

23 23

28const openai = new OpenAI();28const openai = new OpenAI();

29 29

30const result = await openai.responses.create({30const result = await openai.responses.create({

~~31 model: "gpt-5.4",~~31 model: "gpt-5.5",

32 input: "Find the null pointer exception: ...your code here...",32 input: "Find the null pointer exception: ...your code here...",

33 reasoning: { effort: "high" },33 reasoning: { effort: "high" },

34});34});

41client = OpenAI()41client = OpenAI()

42 42

43result = client.responses.create(43result = client.responses.create(

~~44 model="gpt-5.4",~~44 model="gpt-5.5",

45 input="Find the null pointer exception: ...your code here...",45 input="Find the null pointer exception: ...your code here...",

46 reasoning={ "effort": "high" },46 reasoning={ "effort": "high" },

47)47)

54 -H "Content-Type: application/json" \\54 -H "Content-Type: application/json" \\

55 -H "Authorization: Bearer $OPENAI_API_KEY" \\55 -H "Authorization: Bearer $OPENAI_API_KEY" \\

56 -d '{56 -d '{

~~57 "model": "gpt-5.4",~~57 "model": "gpt-5.5",

58 "input": "Find the null pointer exception: ...your code here...",58 "input": "Find the null pointer exception: ...your code here...",

59 "reasoning": { "effort": "high" }59 "reasoning": { "effort": "high" }

60 }'60 }'

70## Next steps70## Next steps

71 71

72- Visit the [Codex docs](https://developers.openai.com/codex) to learn what you can do with Codex, set up Codex in whichever interface you choose, or find more details.72- Visit the [Codex docs](https://developers.openai.com/codex) to learn what you can do with Codex, set up Codex in whichever interface you choose, or find more details.

~~73- Read [Using GPT-5.4](https://developers.openai.com/api/docs/guides/latest-model) for model selection, features, and migration guidance.~~

~~74- See [Prompt guidance for GPT-5.4](https://developers.openai.com/api/docs/guides/prompt-guidance) for prompting patterns that work well on coding and agentic tasks.~~

~~75- Compare [`gpt-5.4`](https://developers.openai.com/api/docs/models/gpt-5.4) and [`gpt-5.3-codex`](https://developers.openai.com/api/docs/models/gpt-5.3-codex) on the model pages.~~

73- Read [Using GPT-5.5](https://developers.openai.com/api/docs/guides/latest-model) for model selection, features, and migration guidance.

74- See [Prompt guidance for GPT-5.5](https://developers.openai.com/api/docs/guides/prompt-guidance) for prompting patterns that work well on coding and agentic tasks.

75- Compare [`gpt-5.5`](https://developers.openai.com/api/docs/models/gpt-5.5) and [`gpt-5.3-codex`](https://developers.openai.com/api/docs/models/gpt-5.3-codex) on the model pages.

guides/compaction.md +2 −2

Details

121 121

122# 1) Compact the current window122# 1) Compact the current window

123compacted = client.responses.compact(123compacted = client.responses.compact(

124 model="gpt-5.4",124 model="gpt-5.5",

125 input=long_input_items_array,125 input=long_input_items_array,

126)126)

127 127

136]136]

137 137

138next_response = client.responses.create(138next_response = client.responses.create(

139 model="gpt-5.4",139 model="gpt-5.5",

140 input=next_input,140 input=next_input,

141 store=False, # Keep the flow ZDR-friendly141 store=False, # Keep the flow ZDR-friendly

142)142)

guides/conversation-state.md +1 −1

Details

6 When troubleshooting cases where GPT-5.4 treats an intermediate update as6 When troubleshooting cases where GPT-5.4 treats an intermediate update as

7 the final answer, verify your integration preserves the assistant message7 the final answer, verify your integration preserves the assistant message

8 `phase` field correctly. See [Phase8 `phase` field correctly. See [Phase

~~9 parameter](https://developers.openai.com/api/docs/guides/prompt-guidance#phase-parameter) for details.~~9 parameter](https://developers.openai.com/api/docs/guides/reasoning#phase-parameter) for details.

10 10

11 11

12## Manually manage conversation state12## Manually manage conversation state

guides/deployment-checklist.md +246 −0 created

Details

1# API deployment checklist

3| Contents | Expected impact |

4| ------------------------------------------------------------------------------- | ----------------------------------- |

5| [Use the Responses API](#use-the-responses-api) | Quality, cost, latency, reliability |

6| [Set up `reasoning.effort`](#set-up-reasoningeffort) | Quality, cost, latency |

7| [Set up `text.verbosity`](#set-up-textverbosity) | Quality, cost, latency |

8| [Set up the assistant `phase` parameter](#set-up-the-assistant-phase-parameter) | Quality, cost |

9| [Use `tool_search`](#use-tool_search) | Cost, latency |

10| [Leverage built-in tools](#leverage-built-in-tools) | Quality |

11| [Leverage compaction](#leverage-compaction) | Cost |

12| [Use `prompt_cache_key`](#use-prompt_cache_key) | Latency, cost |

13| [Use `reasoning.encrypted_content`](#use-reasoningencrypted_content) | Quality, latency |

14| [Use `background=True`](#use-backgroundtrue) | Resumability |

15| [Use WebSocket mode](#use-websocket-mode) | Latency |

17## Use the Responses API

19**Always start** with the

20[Responses API](https://developers.openai.com/api/docs/guides/migrate-to-responses). It is OpenAI's flagship

21API and the best place to access the newest model behavior, built-in tools,

22stateful workflows, and agent features.

24## Set up `reasoning.effort`

26Use `reasoning.effort` to decide how much thinking the model should do before it

27answers.

29For `gpt-5.5`, the supported values are `none`, `low`, `medium`, `high`, and

30`xhigh`. The default is `medium`. Lower effort is faster and uses fewer

31reasoning tokens. Higher effort gives the model more time for planning,

32debugging, synthesis, and multi-step tradeoffs. The right value depends on the

33**task**, not just the model.

35Use `low` when the job is mostly extraction, routing, classification, or a

36simple rewrite. Use `medium` or `high` when the model needs to diagnose a

37problem, compare options, write a plan, or reason through code. Reserve `xhigh`

38for cases where your evals show the extra latency is worth it.

40## Set up `text.verbosity`

42`text.verbosity` is the main lever for balancing brevity against completeness.

43Use lower verbosity when the product needs a quick, compact answer, and higher

44verbosity when the response needs richer explanation, clearer structure, or

45complete context. Lower verbosity means fewer output tokens, so the model

46generates less and returns output faster.

48For coding, `medium` and `high` tend to produce longer, more organized output

49with clearer structure. `low` keeps the answer tighter and more minimal.

51## Set up the assistant `phase` parameter

53`phase` is a label on assistant messages in the conversation history. It

54indicates to the model whether a prior assistant message was an intermediate

55working commentary or the final answer. Use `phase: "commentary"` for progress

56updates, pre-tool-call notes, and other in-between messages. Use

57`phase: "final_answer"` for the completed response.

59The assistant might say something like:

61That is not the answer. It is a progress note. Later, the assistant might say:

63This is useful in long-running or tool-heavy workflows where the assistant may

64produce visible progress updates before it finishes. When you send that history

65back to the model, preserve `phase` on assistant messages so the model can tell

66which messages are progress updates and which message is the final result.

68**Preserve and resend `phase`** on assistant messages on follow-up requests for

69new models like `gpt-5.3-codex` and later. It helps address early stopping,

70ensuring the agent runs until it reaches the final answer.

72## Use `tool_search`

74Instead of loading the full tool catalog into every request, add

75`{"type": "tool_search"}` and mark expensive tool definitions with

76`defer_loading: true`. The model can then load the subset it needs at runtime.

77At request start, the model only sees the search tool name and description. If

78the model decides it needs a deferred tool, it runs tool search, and only then

79are the deferred tool definitions loaded into context. Only then will the model

80call them. This saves tokens and preserves cache performance.

82There are two modes:

84- **Hosted tool search** is the simpler option. Use it when you already know

85 which tools could be available for the request.

86- **Client-executed tool search** is for cases where your app has to decide what

87 tools are available, like based on the user's tenant, project, permissions, or

88 internal registry.

90**Start with hosted tool search** unless your app really needs to control

91discovery itself.

93Group your tools by user intent. Use namespaces or MCP servers when you can. It

94is easier for the model to choose between a few clear groups than a long flat

95list of functions. We recommend keeping each namespace under about 10 functions

96for optimal token efficiency and model performance.

98Keep namespace descriptions short and discriminative. Put the detailed

99instructions inside the deferred tool definitions. Avoid making one giant

100namespace for everything.

101

102## Leverage built-in tools

103

104[Built-in tools](https://developers.openai.com/api/docs/guides/tools) are the API's native capabilities.

105Instead of building every tool yourself, you can give the model access to tools

106that already work inside the Responses API. The model can then decide when to

107use them.

108

109OpenAI keeps adding more native tools, so start with built-in tools when they

110fit your workflow. Build custom tools when native options do not cover the task.

111Current built-in tools and related tool options include:

112

113- **Web search**: Search the web for up-to-date information

114- **File search**: Search uploaded files or vector stores

115- **Code interpreter**: Run Python for analysis, math, charts, and file

116 processing

117- **Shell**: Run shell commands in a hosted container or your own runtime

118- **Computer use**: Operate a UI through screenshots, clicks, typing, and

119 scrolling

120- **Image generation**: Generate or edit images

121- **MCP/connectors**: Connect the model to external services and tools

122- **Skills**: Attach reusable instruction bundles and workflow files

123- **Apply patch**: Make structured code edits

124

125There is also a model-quality reason to prefer them. Built-in tools are

126in-distribution for our post-training, meaning that the models are trained and

127evaluated around these tool shapes, behaviors, and outputs. With built-in tools,

128OpenAI models support better tool selection, cleaner execution, and fewer

129failures than with new tools.

130

131## Leverage compaction

132

133[Compaction](https://developers.openai.com/api/docs/guides/compaction) is a context engineering tool: it

134decides what information the model carries forward across many turns. In

135long-running agents, the problem is not just, "Will I hit the context limit?" It

136is that old messages, tool logs, retries, and stale details crowd out the state

137the model needs.

138

139Compaction gives you a controlled way to reduce context size while preserving

140state needed for subsequent turns. After a meaningful milestone, like finishing

141a debugging phase or narrowing a root cause, you can compact the prior window

142and continue from the compacted output. This keeps the model sharp because the

143next turn is built around the important state, not every intermediate reasoning,

144failed command, and obsolete branch of reasoning.

145

146There are two ways to leverage compaction:

147

148- **Let the server handle it**: if you use `previous_response_id`, turn on

149 `context_management` with a `compact_threshold`. The server will automatically

150 compact the conversation when it gets too large. You keep sending only the

151 newest user message.

152- **Do it yourself**: if you manage the full input array yourself, call

153 `client.responses.compact()`. It gives back a smaller context window. Use that

154 returned output directly in the next `responses.create()` call.

155

156**Do not edit the compacted output.** It is not a human summary, but the machine

157state that helps the model continue. Pass it forward as-is, then add the next

158user message.

159

160## Use `prompt_cache_key`

161

162[Prompt caching](https://developers.openai.com/api/docs/guides/prompt-caching) automatically reduces latency

163and cost when requests reuse the same long prefix. For high-volume workflows,

164set

165[`prompt_cache_key`](https://developers.openai.com/api/docs/api-reference/responses/create#responses-create-prompt_cache_key)

166consistently for requests that share the same stable prefix.

167

168The cache key is combined with the prompt prefix hash, so it helps route similar

169requests to the same cache without changing the model input. Keep the key stable

170for genuinely shared prefixes, and choose a granularity that avoids sending too

171much traffic to one prefix-key pair. If one prefix and `prompt_cache_key`

172combination exceeds about 15 requests per minute, requests may overflow to

173additional machines and reduce cache effectiveness.

174

175## Use `reasoning.encrypted_content`

176

177Always round-trip reasoning items. This helps the model by allowing it to work

178from its prior reasoning. If your [Zero Data Retention

179(ZDR)](https://developers.openai.com/api/docs/guides/your-data#zero-data-retention) requirements do not allow

180storing response data, this is where `reasoning.encrypted_content` is important.

181`reasoning.encrypted_content` gives you a stateless handoff.

182

183Add `reasoning.encrypted_content` to `include`, and reasoning items in the

184response output will include encrypted reasoning content that can be passed back

185into the next request. Your app does not need to understand that value. It just

186keeps the reasoning item exactly as returned and sends it back during the next

187turn, so the model can use it to continue the workflow.

188

189## Use `background=True`

190

191Use [`background=True`](https://developers.openai.com/api/docs/guides/background) for requests that may take

192a long time. Instead of keeping the client connection open, the API starts a job

193and returns an ID. Your app can poll that job until it finishes, fails, or is

194canceled. Use it for large analyses, long tool runs, or work that needs status

195and retry behavior.

196

197`background=True` **requires `store=True`**.

198

199You can combine it with `stream=True` for progress events, but the first event

200may take longer than a normal request.

201

202From the UI perspective, background mode indicates, "This is running; here is

203the status; the result will appear here when it's ready."

204

205Note: `background=True` is not compatible with [Zero Data

206Retention](https://developers.openai.com/api/docs/guides/your-data#zero-data-retention).

207

208## Use WebSocket mode

209

210[WebSocket mode](https://developers.openai.com/api/docs/guides/websocket-mode) is built for long-running,

211tool-call-heavy workflows where you keep a persistent connection open and

212continue by sending only new input items plus `previous_response_id`. For

213rollouts with 20 or more tool calls, this approach is roughly 40% faster

214end-to-end.

215

216**How this works**: The first message will look like a normal Responses request:

217model, instructions, tools, and user input. The server streams events back. If

218the model asks for a tool, your app runs the tool. Then, instead of sending a new

219HTTP request, you send another `response.create` event on the same socket with

220the prior `previous_response_id` and the new item. That is where the latency win

221comes from. In plain HTTP, every follow-up is a fresh request. In WebSocket mode,

222the connection stays open and the most recent response state stays warm in

223memory on that connection. When the next turn continues from that response, the

224backend has to do less setup work.

225

226If your workflow is one request, one answer, then **keep HTTP**. If your

227workflow behaves like a long-running agent, try WebSocket mode.

228

229A single WebSocket connection handles one in-flight response at a time, so

230parallel work needs multiple connections. Connections currently top out at 60

231minutes. Continuation uses the same `previous_response_id` semantics as HTTP

232mode, with a connection-local cache for the most recent response.

233

234Note: WebSocket mode works with ZDR because your data is not stored to disk,

235only stored in memory.

236

237The default Python sample uses `websocket-client` (`pip install

238websocket-client`). The JavaScript sample uses `ws` (`npm install ws`).

239

240## Final takeaway

241

242Responses API is the foundation for building smarter, more capable OpenAI

243applications. The real advantage is that it lets developers move from one-off

244prompts to durable, tool-using, context-aware workflows that can adapt to the

245complexity of the task. Follow this guide to see higher performance in real

246deployments.

guides/file-inputs.md +8 −8

Details

58 -H "Content-Type: application/json" \\58 -H "Content-Type: application/json" \\

59 -H "Authorization: Bearer $OPENAI_API_KEY" \\59 -H "Authorization: Bearer $OPENAI_API_KEY" \\

60 -d '{60 -d '{

~~61 "model": "gpt-5",~~61 "model": "gpt-5.5",

62 "input": [62 "input": [

63 {63 {

64 "role": "user",64 "role": "user",

82const client = new OpenAI();82const client = new OpenAI();

83 83

84const response = await client.responses.create({84const response = await client.responses.create({

~~85 model: "gpt-5",~~85 model: "gpt-5.5",

86 input: [86 input: [

87 {87 {

88 role: "user",88 role: "user",

108client = OpenAI()108client = OpenAI()

109 109

110response = client.responses.create(110response = client.responses.create(

111 model="gpt-5",111 model="gpt-5.5",

112 input=[112 input=[

113 {113 {

114 "role": "user",114 "role": "user",

134using OpenAI.Responses;134using OpenAI.Responses;

135 135

136string key = Environment.GetEnvironmentVariable("OPENAI_API_KEY")!;136string key = Environment.GetEnvironmentVariable("OPENAI_API_KEY")!;

137OpenAIResponseClient client = new(model: "gpt-5", apiKey: key);137OpenAIResponseClient client = new(model: "gpt-5.5", apiKey: key);

138 138

139using HttpClient http = new();139using HttpClient http = new();

140using Stream stream = await http.GetStreamAsync("https://www.berkshirehathaway.com/letters/2024ltr.pdf");140using Stream stream = await http.GetStreamAsync("https://www.berkshirehathaway.com/letters/2024ltr.pdf");

174 -H "Content-Type: application/json" \\174 -H "Content-Type: application/json" \\

175 -H "Authorization: Bearer $OPENAI_API_KEY" \\175 -H "Authorization: Bearer $OPENAI_API_KEY" \\

176 -d '{176 -d '{

177 "model": "gpt-5",177 "model": "gpt-5.5",

178 "input": [178 "input": [

179 {179 {

180 "role": "user",180 "role": "user",

204});204});

205 205

206const response = await client.responses.create({206const response = await client.responses.create({

207 model: "gpt-5",207 model: "gpt-5.5",

208 input: [208 input: [

209 {209 {

210 role: "user",210 role: "user",

235)235)

236 236

237response = client.responses.create(237response = client.responses.create(

238 model="gpt-5",238 model="gpt-5.5",

239 input=[239 input=[

240 {240 {

241 "role": "user",241 "role": "user",

261using OpenAI.Responses;261using OpenAI.Responses;

262 262

263string key = Environment.GetEnvironmentVariable("OPENAI_API_KEY")!;263string key = Environment.GetEnvironmentVariable("OPENAI_API_KEY")!;

264OpenAIResponseClient client = new(model: "gpt-5", apiKey: key);264OpenAIResponseClient client = new(model: "gpt-5.5", apiKey: key);

265 265

266OpenAIFileClient files = new(key);266OpenAIFileClient files = new(key);

267OpenAIFile file = files.UploadFile("draconomicon.pdf", FileUploadPurpose.UserData);267OpenAIFile file = files.UploadFile("draconomicon.pdf", FileUploadPurpose.UserData);

guides/flex-processing.md +3 −3

Details

21});21});

22 22

23const response = await client.responses.create({23const response = await client.responses.create({

~~24 model: "gpt-5.4",~~24 model: "gpt-5.5",

25 instructions: "List and describe all the metaphors used in this book.",25 instructions: "List and describe all the metaphors used in this book.",

26 input: "<very long text of book here>",26 input: "<very long text of book here>",

27 service_tier: "flex",27 service_tier: "flex",

39 39

40# you can override the max timeout per request as well40# you can override the max timeout per request as well

41response = client.with_options(timeout=900.0).responses.create(41response = client.with_options(timeout=900.0).responses.create(

~~42 model="gpt-5.4",~~42 model="gpt-5.5",

43 instructions="List and describe all the metaphors used in this book.",43 instructions="List and describe all the metaphors used in this book.",

44 input="<very long text of book here>",44 input="<very long text of book here>",

45 service_tier="flex",45 service_tier="flex",

53 -H "Authorization: Bearer $OPENAI_API_KEY" \\53 -H "Authorization: Bearer $OPENAI_API_KEY" \\

54 -H "Content-Type: application/json" \\54 -H "Content-Type: application/json" \\

55 -d '{55 -d '{

~~56 "model": "gpt-5.4",~~56 "model": "gpt-5.5",

57 "instructions": "List and describe all the metaphors used in this book.",57 "instructions": "List and describe all the metaphors used in this book.",

58 "input": "<very long text of book here>",58 "input": "<very long text of book here>",

59 "service_tier": "flex"59 "service_tier": "flex"

guides/frontend-prompt.md +38 −0 created

Details

1# Frontend prompt instructions

3These instructions target GPT-5.5, but many of the patterns apply to other model versions as well.

5```

6## Frontend guidance

7You follow these instructions when building applications with a frontend experience:

9### Build with empathy

10- If working with an existing design or given a design framework in context, you pay careful attention to existing conventions and ensure that what you build is consistent with the frameworks used and design of the existing application.

11- You think deeply about the audience of what you are building and use that to decide what features to build and when designing layout, components, visual style, on-screen text, and interaction patterns. Using your application should feel rich and sophisticated.

12- You make sure that the frontend design is tailored for the domain and subject matter of the application. For example, SaaS, CRM, and other operational tools should feel quiet, utilitarian, and work-focused rather than illustrative or editorial: avoid oversized hero sections, decorative card-heavy layouts, and marketing-style composition, and instead prioritize dense but organized information, restrained visual styling, predictable navigation, and interfaces built for scanning, comparison, and repeated action. A game can be more illustrative, expressive, animated, and playful.

13- You make sure that common workflows within the app are ergonomic, efficient, and comprehensive, so the user of your application can seamlessly navigate in and out of different views and pages in the application.

15### Design instructions

16- You make sure to use icons in buttons for tools, swatches for color, segmented controls for modes, toggles/checkboxes for binary settings, sliders/steppers/inputs for numeric values, menus for option sets, tabs for views, and text or icon+text buttons only for clear commands (unless otherwise specified). Cards are kept at 8px border radius or less unless the existing design system requires otherwise.

17- You do not use rounded rectangular UI elements with text inside if you could use a familiar symbol or icon instead (examples include arrow icons for undo/redo, B/I icons for bold/italics, save/download/zoom icons). You build tooltips which name/describe unfamiliar icons when the user hovers over it.

18- You use lucide icons inside buttons whenever one exists instead of manually-drawn SVG icons. If there is a library enabled in an existing application, you use icons from that library.

19- You build feature-complete controls, states, and views that a target user would naturally expect from the application.

20- You do not use visible, in-app text to describe the application's features, functionality, keyboard shortcuts, styling, visual elements, or how to use the application.

21- You should not make a landing page unless absolutely required; when asked for a site, app, game, or tool, build the actual usable experience as the first screen, not marketing or explanatory content.

22- When making a hero page, you use a relevant image, generated bitmap image, or immersive full-bleed interactive scene as the background with text over it that is not in a card; never use a split text/media layout where a card is one side and text is on another side, never put hero text or the primary experience in a card, never use a gradient/SVG hero page, and do not create an SVG hero illustration when a real or generated image can carry the subject.

23- On branded, product, venue, portfolio, or object-focused pages, the brand/product/place/object must be a first-viewport signal, not only tiny nav text or an eyebrow. Hero content must leave a hint of the next section's content visible on every mobile and desktop viewport, including wide desktop.

24- For landing-page heroes, make the H1 the brand/product/place/person name or a literal offer/category; put descriptive value props in supporting copy, not the headline.

25- Websites and games must use visual assets. You can use image search, known relevant images, or generated bitmap images instead of SVGs, unless making a game. Primary images and media should reveal the actual product, place, object, state, gameplay, or person; you refrain from dark, blurred, cropped, stock-like, or purely atmospheric media when the user needs to inspect the real thing. For highly specific game assets you use custom SVG/Three.js/etc.

26- For games or interactive tools with well-established rules, physics, parsing, or AI engines, you use a proven existing library for the core domain logic instead of hand-rolling it, unless the user explicitly asks for a from-scratch implementation.

27- You use Three.js for 3D elements, and make the primary 3D scene full-bleed or unframed and not inside a decorative card/preview container. Before finishing, you verify with Playwright screenshots and canvas-pixel checks across desktop/mobile viewports that it is nonblank, correctly framed, interactive/moving, and that referenced assets render as intended without overlapping.

28- You do not put UI cards inside other cards. Do not style page sections as floating cards. Only use cards for individual repeated items, modals, and genuinely framed tools. Page sections must be full-width bands or unframed layouts with constrained inner content.

29- You do not add discrete orbs, gradient orbs, or bokeh blobs as decoration or backgrounds.

30- You make sure that text fits within its parent UI element on all mobile and desktop viewports. Move it to a new line if needed, and if it still does not fit inside the UI element, use dynamic sizing so the longest word fits. Text must also not occlude preceding or subsequent content. Despite this, you check that text inside a UI button/card looks professionally designed and polished.

31- Match display text to its container: reserve hero-scale type for true heroes, and use smaller, tighter headings inside compact panels, cards, sidebars, dashboards, and tool surfaces.

32- You define stable dimensions with responsive constraints (such as aspect-ratio, grid tracks, min/max, or container-relative sizing) for fixed-format UI elements like boards, grids, toolbars, icon buttons, counters, or tiles, so hover states, labels, icons, pieces, loading text, or dynamic content cannot resize or shift the layout.

33- You do not scale font size with viewport width. Letter spacing must be 0, not negative.

34- You do not make one-note palettes: avoid UIs dominated by variations of a single hue family, and limit dominant purple/purple-blue gradients, beige/cream/sand/tan, dark blue/slate, and brown/orange/espresso palettes; scan CSS colors before finalizing and revise if the page reads as one of these themes.

35- You make sure that UI elements and on-screen text do not overlap with each other in an incoherent manner. This is extremely important because overlap can lead to a jarring user experience.

37When building a site or app that needs a dev server to run properly, you start the local dev server after implementation and give the user the URL so they can try it. If there's already a server on that port, you use another one. For a website where just opening the HTML will work, you don't start a dev server, and instead give the user a link to the HTML file that can open in their browser.

38```

guides/image-generation.md +17 −17

Details

70const openai = new OpenAI();70const openai = new OpenAI();

71 71

72const response = await openai.responses.create({72const response = await openai.responses.create({

~~73 model: "gpt-5.4",~~73 model: "gpt-5.5",

74 input: "Generate an image of gray tabby cat hugging an otter with an orange scarf",74 input: "Generate an image of gray tabby cat hugging an otter with an orange scarf",

75 tools: [{type: "image_generation"}],75 tools: [{type: "image_generation"}],

76});76});

94client = OpenAI() 94client = OpenAI()

95 95

96response = client.responses.create(96response = client.responses.create(

~~97 model="gpt-5.4",~~97 model="gpt-5.5",

98 input="Generate an image of gray tabby cat hugging an otter with an orange scarf",98 input="Generate an image of gray tabby cat hugging an otter with an orange scarf",

99 tools=[{"type": "image_generation"}],99 tools=[{"type": "image_generation"}],

100)100)

189const openai = new OpenAI();189const openai = new OpenAI();

190 190

191const response = await openai.responses.create({191const response = await openai.responses.create({

192 model: "gpt-5.4",192 model: "gpt-5.5",

193 input: "Generate an image of gray tabby cat hugging an otter with an orange scarf",193 input: "Generate an image of gray tabby cat hugging an otter with an orange scarf",

194 tools: [{type: "image_generation", action: "generate"}],194 tools: [{type: "image_generation", action: "generate"}],

195});195});

213client = OpenAI() 213client = OpenAI()

214 214

215response = client.responses.create(215response = client.responses.create(

216 model="gpt-5.4",216 model="gpt-5.5",

217 input="Generate an image of gray tabby cat hugging an otter with an orange scarf",217 input="Generate an image of gray tabby cat hugging an otter with an orange scarf",

218 tools=[{"type": "image_generation", "action": "generate"}],218 tools=[{"type": "image_generation", "action": "generate"}],

219)219)

245const openai = new OpenAI();245const openai = new OpenAI();

246 246

247const response = await openai.responses.create({247const response = await openai.responses.create({

248 model: "gpt-5.4",248 model: "gpt-5.5",

249 input:249 input:

250 "Generate an image of gray tabby cat hugging an otter with an orange scarf",250 "Generate an image of gray tabby cat hugging an otter with an orange scarf",

251 tools: [{ type: "image_generation" }],251 tools: [{ type: "image_generation" }],

264// Follow up264// Follow up

265 265

266const response_fwup = await openai.responses.create({266const response_fwup = await openai.responses.create({

267 model: "gpt-5.4",267 model: "gpt-5.5",

268 previous_response_id: response.id,268 previous_response_id: response.id,

269 input: "Now make it look realistic",269 input: "Now make it look realistic",

270 tools: [{ type: "image_generation" }],270 tools: [{ type: "image_generation" }],

291client = OpenAI()291client = OpenAI()

292 292

293response = client.responses.create(293response = client.responses.create(

294 model="gpt-5.4",294 model="gpt-5.5",

295 input="Generate an image of gray tabby cat hugging an otter with an orange scarf",295 input="Generate an image of gray tabby cat hugging an otter with an orange scarf",

296 tools=[{"type": "image_generation"}],296 tools=[{"type": "image_generation"}],

297)297)

312# Follow up312# Follow up

313 313

314response_fwup = client.responses.create(314response_fwup = client.responses.create(

315 model="gpt-5.4",315 model="gpt-5.5",

316 previous_response_id=response.id,316 previous_response_id=response.id,

317 input="Now make it look realistic",317 input="Now make it look realistic",

318 tools=[{"type": "image_generation"}],318 tools=[{"type": "image_generation"}],

340const openai = new OpenAI();340const openai = new OpenAI();

341 341

342const response = await openai.responses.create({342const response = await openai.responses.create({

343 model: "gpt-5.4",343 model: "gpt-5.5",

344 input:344 input:

345 "Generate an image of gray tabby cat hugging an otter with an orange scarf",345 "Generate an image of gray tabby cat hugging an otter with an orange scarf",

346 tools: [{ type: "image_generation" }],346 tools: [{ type: "image_generation" }],

361// Follow up361// Follow up

362 362

363const response_fwup = await openai.responses.create({363const response_fwup = await openai.responses.create({

364 model: "gpt-5.4",364 model: "gpt-5.5",

365 input: [365 input: [

366 {366 {

367 role: "user",367 role: "user",

394import base64394import base64

395 395

396response = openai.responses.create(396response = openai.responses.create(

397 model="gpt-5.4",397 model="gpt-5.5",

398 input="Generate an image of gray tabby cat hugging an otter with an orange scarf",398 input="Generate an image of gray tabby cat hugging an otter with an orange scarf",

399 tools=[{"type": "image_generation"}],399 tools=[{"type": "image_generation"}],

400)400)

417# Follow up417# Follow up

418 418

419response_fwup = openai.responses.create(419response_fwup = openai.responses.create(

420 model="gpt-5.4",420 model="gpt-5.5",

421 input=[421 input=[

422 {422 {

423 "role": "user",423 "role": "user",

506const openai = new OpenAI();506const openai = new OpenAI();

507 507

508const stream = await openai.responses.create({508const stream = await openai.responses.create({

509 model: "gpt-5.4",509 model: "gpt-5.5",

510 input:510 input:

511 "Draw a gorgeous image of a river made of white owl feathers, snaking its way through a serene winter landscape",511 "Draw a gorgeous image of a river made of white owl feathers, snaking its way through a serene winter landscape",

512 stream: true,512 stream: true,

530client = OpenAI()530client = OpenAI()

531 531

532stream = client.responses.create(532stream = client.responses.create(

533 model="gpt-5.4",533 model="gpt-5.5",

534 input="Draw a gorgeous image of a river made of white owl feathers, snaking its way through a serene winter landscape",534 input="Draw a gorgeous image of a river made of white owl feathers, snaking its way through a serene winter landscape",

535 stream=True,535 stream=True,

536 tools=[{"type": "image_generation", "partial_images": 2}],536 tools=[{"type": "image_generation", "partial_images": 2}],

618 618

619### Revised prompt619### Revised prompt

620 620

621When using the image generation tool in the Responses API, the mainline model (for example, `gpt-5.4`) will automatically revise your prompt for improved performance.621When using the image generation tool in the Responses API, the mainline model (for example, `gpt-5.5`) will automatically revise your prompt for improved performance.

622 622

623You can access the revised prompt in the `revised_prompt` field of the image generation call:623You can access the revised prompt in the `revised_prompt` field of the image generation call:

624 624

767maskId = create_file("mask.png")767maskId = create_file("mask.png")

768 768

769response = client.responses.create(769response = client.responses.create(

770 model="gpt-5.4",770 model="gpt-5.5",

771 input=[771 input=[

772 {772 {

773 "role": "user",773 "role": "user",

814const maskId = await createFile("mask.png");814const maskId = await createFile("mask.png");

815 815

816const response = await openai.responses.create({816const response = await openai.responses.create({

817 model: "gpt-5.4",817 model: "gpt-5.5",

818 input: [818 input: [

819 {819 {

820 role: "user",820 role: "user",

guides/images-vision.md +28 −7

Details

442 442

443### Choose an image detail level443### Choose an image detail level

444 444

445The `detail` parameter tells the model what level of detail to use when processing and understanding the image (`low`, `high`, `original`, or `auto` to let the model decide). If you skip the parameter, the model will use `auto`. This behavior is the same in both the Responses API and the Chat Completions API.445The `detail` parameter tells the model what level of detail to use when processing and understanding the image (`low`, `high`, `original`, or `auto`). If you skip the parameter, the model will use `auto`. This behavior is the same in both the Responses API and the Chat Completions API. On `gpt-5.5`, `auto` and the default omitted behavior are equivalent to `original`.

446 446

447 447

448 448

451 451

452| Detail level | Best for |452| Detail level | Best for |

453| ------------ | ---------------------------------------------------------------------------------------------------------------------------------------------- |453| ------------ | ---------------------------------------------------------------------------------------------------------------------------------------------- |

454| `"low"` | Fast, low-cost understanding when fine visual detail is not important. The model receives a low-resolution 512px x 512px version of the image. |454| `low` | Fast, low-cost understanding when fine visual detail is not important. The model receives a low-resolution 512px x 512px version of the image. |

458 458

459For computer use, localization, and click-accuracy use cases on `gpt-5.4` and future models, we recommend `"detail": "original"`. See the [Computer use guide](https://developers.openai.com/api/docs/guides/tools-computer-use) for more detail.459For computer use, localization, and click-accuracy use cases on `gpt-5.4` and future models, we recommend `"detail": "original"`. See the [Computer use guide](https://developers.openai.com/api/docs/guides/tools-computer-use) for more detail.

460 460

474 </tr>474 </tr>

475 <tr>475 <tr>

476 <td>476 <td>

477 <code>gpt-5.4</code> and future models477 <code>gpt-5.5</code>

478 </td>478 </td>

479 <td>479 <td>

480 <code>low</code>, <code>high</code>, <code>original</code>,480 <code>low</code>, <code>high</code>, <code>original</code>,

485 dimension. <code>original</code> allows up to 10,000 patches or a485 dimension. <code>original</code> allows up to 10,000 patches or a

486 6000-pixel maximum dimension. If either limit is exceeded, we resize the486 6000-pixel maximum dimension. If either limit is exceeded, we resize the

487 image while preserving aspect ratio to fit within the lesser of those two487 image while preserving aspect ratio to fit within the lesser of those two

488 constraints for the selected detail level. [Full resizing details488 constraints for the selected detail level. <code>auto</code> and omitted

489 <code>detail</code> use the same sizing behavior as

490 <code>original</code>. [Full resizing details

491 below.](#patch-based-image-tokenization)

492 </td>

493 </tr>

494 <tr>

495 <td>

496 <code>gpt-5.4</code>

497 </td>

498 <td>

499 <code>low</code>, <code>high</code>, <code>original</code>,

500 <code>auto</code>

501 </td>

502 <td>

503 <code>high</code> allows up to 2,500 patches or a 2048-pixel maximum

504 dimension. <code>original</code> allows up to 10,000 patches or a

505 6000-pixel maximum dimension. If either limit is exceeded, we resize the

506 image while preserving aspect ratio to fit within the lesser of those two

507 constraints for the selected detail level. <code>auto</code> and omitted

508 <code>detail</code> use the same sizing behavior as

509 <code>high</code>.[Full resizing details

489 below.](#patch-based-image-tokenization)510 below.](#patch-based-image-tokenization)

490 </td>511 </td>

491 </tr>512 </tr>

guides/latest-model.md +55 −615

Details

1---1---

2latestModelInfo:2latestModelInfo:

~~3 model: gpt-5.4~~3 model: gpt-5.5

~~4 migrationGuide: /api/docs/guides/upgrading-to-gpt-5p4.md~~4 migrationGuide: /api/docs/guides/upgrading-to-gpt-5p5.md

5 promptingGuide: /api/docs/guides/prompt-guidance.md5 promptingGuide: /api/docs/guides/prompt-guidance.md

6---6---

7 7

~~8# Using GPT-5.4~~8# Using GPT-5.5

9 9

~~10import {~~10## Introduction

~~11 Bolt,~~

~~12 Code,~~

~~13 Sparkles,~~

~~14} from "@components/react/oai/platform/ui/Icon.react";~~

15 11

12GPT-5.5 raises the baseline for complex production workflows. It’s a strong fit for coding use cases, tool-heavy agents, grounded assistants, long-context retrieval, product-spec-to-plan workflows, and customer-facing workflows where execution quality and response polish are critical.

16 13

14To get the most out of GPT-5.5, treat it as a new model family to tune for, not a drop-in replacement for `gpt-5.2` or `gpt-5.4`. Begin migration with a fresh baseline instead of carrying over every instruction from an older prompt stack. Start with the smallest prompt that preserves the product contract, then tune reasoning effort, verbosity, tool descriptions, and output format against representative examples.

17 15

16GPT-5.5 supports all API features that were already available with GPT-5.4, including [prompt caching](https://developers.openai.com/api/docs/guides/prompt-caching), [hosted tools](https://developers.openai.com/api/docs/guides/tools#available-tools), [tool search](https://developers.openai.com/api/docs/guides/tools-tool-search), [compaction](https://developers.openai.com/api/docs/guides/compaction), and `phase` handling for manually replayed assistant items.

18 17

~~19export const fastResponses = <>~~18See the [GPT-5.5 Prompting Guide](https://developers.openai.com/api/docs/guides/prompt-guidance?model=gpt-5.5) for examples of successful prompting patterns.

20 19

~~21GPT-5.4 has a new reasoning mode: `none` for low-latency interactions. By default, GPT-5.4 reasoning is set to `none`.~~20## What's new

22 21

~~23 ~~22- **More efficient reasoning:** GPT-5.5 reaches strong results with fewer reasoning tokens than prior models, even at the same reasoning effort. This is especially useful in complex, tool-heavy, or multi-step workflows where token savings compound.

~~24 ~~23- **Stronger task execution with outcome-first prompts:** GPT-5.5 is better at working from a clear goal, preserving constraints, and turning product intent into concrete next steps. Describe the expected outcome, success criteria, allowed side effects, evidence rules, and output shape. Avoid step-by-step process guidance unless the exact path matters.

24- **Stronger and more precise tool use:** GPT-5.5 is especially useful on large tool surfaces, multi-step service workflows, and long-running agent tasks. It tends to be more precise in tool selection and argument use.

25- **Tone is often more polished, but can be more direct:** GPT-5.5 often produces warmer, more readable answers with less prompt scaffolding.

25 26

~~26This behavior will more closely (but not exactly!) match non-reasoning models like ~~27## Behavioral changes

27 28

~~28[GPT-4.1](https://developers.openai.com/api/docs/models/gpt-4.1). We expect GPT-5.4 to produce~~291. **Reasoning effort now defaults to `medium`:** GPT-5.5 defaults to `medium` reasoning effort. Treat `medium` as the recommended balanced starting point for quality, reliability, latency, and cost. For latency-sensitive workflows, evaluate `low` before `none` when tool use, planning, search, or multi-step decision making still matters. Reserve `none` for latency-critical tasks that don't need reasoning or multi-chained tool calls, such as lightweight voice turns, fast information retrieval, and classification. Increase to `high` or `xhigh` only when evals show a measurable quality gain that justifies the extra latency and cost. See the [Reasoning models documentation](https://developers.openai.com/api/docs/guides/reasoning) for more details on recommended settings.

~~29more intelligent responses than GPT-4.1, but when speed and maximum context~~

~~30length are paramount, you might consider using GPT-4.1 instead.~~

31 30

~~32Fast, low latency response options~~31 Higher reasoning effort isn't automatically better. If the task has conflicting instructions, weak stopping criteria, or open-ended tool access, higher effort can lead to overthinking, unnecessary searching, or output quality regressions. Increase effort only when evals show a measurable quality gain.

33 32

~~34```javascript~~332. **Image inputs preserve more visual detail by default:** GPT-5.5 updates the default handling for image inputs to preserve more visual detail and improve computer use performance. When `image_detail` is unset or set to `auto`, the model now uses `original` behavior, preserving images without resizing up to 10,240,000 pixels or a 6,000-pixel dimension limit. For `high`, specify the value directly; it preserves images without resizing up to 2,500,000 pixels or a 2,048-pixel dimension limit. `low` now focuses on context efficiency and resizes images above a 512-pixel dimension limit more aggressively than previous models. See the [Images and vision documentation](https://developers.openai.com/api/docs/guides/images-vision).

~~35import OpenAI from "openai";~~

~~36const openai = new OpenAI();~~

37 34

~~38const result = await openai.responses.create({~~353. **Improved instruction following:** GPT-5.5 interprets prompts in a literal and thorough manner, enabling specific, descriptive instructions when the product requires them. Define success criteria and stopping rules, especially for long-running, tool-heavy, or evidence-gathering workflows. See [Write outcome-first prompts](https://developers.openai.com/api/docs/guides/prompt-guidance?model=gpt-5.5#outcome-first-prompts-and-stopping-conditions) and [Keep the right specificity](https://developers.openai.com/api/docs/guides/prompt-guidance?model=gpt-5.5#formatting).

~~39 model: "gpt-5.4",~~

~~40 input: "Write a haiku about code.",~~

~~41 reasoning: { effort: "low" },~~

~~42 text: { verbosity: "low" },~~

~~43});~~

44 36

~~45console.log(result.output_text);~~374. **Default style is more concise and direct:** GPT-5.5 tends to be efficient, direct, and task-oriented by default. This is useful for many production workflows, but customer-facing or conversational experiences may need explicit personality, warmth, rationale, and formatting guidance. Use `text.verbosity` intentionally: `medium` is the default, and `low` is often a better starting point for concise responses. See the [GPT-5.5 prompting guide](https://developers.openai.com/api/docs/guides/prompt-guidance?model=gpt-5.5).

~~46```~~

~~48```python~~

~~49from openai import OpenAI~~

~~50client = OpenAI()~~

~~52result = client.responses.create(~~

~~53 model="gpt-5.4",~~

~~54 input="Write a haiku about code.",~~

~~55 reasoning={ "effort": "low" },~~

~~56 text={ "verbosity": "low" },~~

~~57)~~

~~59print(result.output_text)~~

~~60```~~

~~62```bash~~

~~63curl https://api.openai.com/v1/responses \\~~

~~64 -H "Content-Type: application/json" \\~~

~~65 -H "Authorization: Bearer $OPENAI_API_KEY" \\~~

~~66 -d '{~~

~~67 "model": "gpt-5.4",~~

~~68 "input": "Write a haiku about code.",~~

~~69 "reasoning": { "effort": "low" }~~

~~70 }'~~

~~71```~~

~~74</>;~~

~~76export const goodResponses = <>~~

~~78GPT-5.4 is great at reasoning through complex tasks. For complex tasks like coding and multi-step planning,~~

~~79use high reasoning effort.~~

~~81 ~~

~~82 ~~

~~84Use these configurations when replacing tasks you might have used o3 to tackle.~~

~~85We expect GPT-5.4 to produce better results than o3 and o4-mini under most circumstances.~~

~~87Slower, high reasoning tasks~~

~~89```javascript~~

~~90import OpenAI from "openai";~~

~~91const openai = new OpenAI();~~

~~93const result = await openai.responses.create({~~

~~94 model: "gpt-5.4",~~

~~95 input: "Find the null pointer exception: ...your code here...",~~

~~96 reasoning: { effort: "high" },~~

~~97});~~

~~99console.log(result.output_text);~~

100```

~~101~~

102```python

103from openai import OpenAI

104client = OpenAI()

~~105~~

106result = client.responses.create(

107 model="gpt-5.4",

108 input="Find the null pointer exception: ...your code here...",

109 reasoning={ "effort": "high" },

110)

~~111~~

112print(result.output_text)

113```

~~114~~

115```bash

116curl https://api.openai.com/v1/responses \\

117 -H "Content-Type: application/json" \\

118 -H "Authorization: Bearer $OPENAI_API_KEY" \\

119 -d '{

120 "model": "gpt-5.4",

121 "input": "Find the null pointer exception: ...your code here...",

122 "reasoning": { "effort": "high" }

123 }'

124```

~~125~~

~~126~~

127</>;

~~128~~

129[GPT-5.4](https://developers.openai.com/api/docs/models/gpt-5.4) is our most capable frontier model yet, delivering higher-quality outputs with fewer iterations across ChatGPT, the API, and Codex. It helps people and teams analyze complex information, build production software, and automate multi-step workflows.

~~130~~

131GPT-5.5 is currently available in ChatGPT and Codex, with API availability

132 coming soon.

~~133~~

134In practice, `gpt-5.4` is the default model for both broad general-purpose work and most coding tasks. Start there when you want one model that can move between software engineering, reasoning, writing, and tool use in the same workflow.

~~135~~

136This guide covers key features of the GPT-5 model family and how to get the most out of GPT-5.4.

~~137~~

138## Key improvements

~~139~~

140Compared with the previous GPT-5.2 model, GPT-5.4 shows improvements in:

~~141~~

142- Coding, document understanding, tool use, and instruction following

143- Image perception and multimodal tasks

144- Long-running task execution and multi-step agent workflows

145- Token efficiency and end-to-end performance on tool-heavy workloads

146- Agentic web search and multi-source synthesis, especially for hard-to-locate information

147- Document-heavy and spreadsheet-heavy business workflows in customer service, analytics, and finance

~~148~~

149GPT-5.4 brings the coding capabilities of GPT-5.3-Codex to our flagship frontier model. Developers can generate production-quality code, build polished front-end UI, follow repo-specific patterns, and handle multi-file changes with fewer retries. It also has a strong out-of-the-box coding personality, so teams spend less time on prompt tuning.

~~150~~

151For agentic workloads, GPT-5.4 reduces end-to-end time across multi-step trajectories and often completes tasks with fewer tokens and tool calls. This makes agents more responsive and lowers the cost of operating complex workflows at scale in the API and Codex.

~~152~~

153### New features in GPT-5.4

~~154~~

155Like earlier GPT-5 models, GPT-5.4 supports custom tools, parameters to control verbosity and reasoning, and an allowed tools list. GPT-5.4 also introduces several capabilities that make it easier to build powerful agent systems, operate over larger bodies of information, and run more reliable automated workflows:

~~156~~

157- **`tool_search` in the API:** GPT-5.4 improves tool search for larger tool ecosystems by using deferred tool loading. This makes tools searchable, loads only the relevant definitions, reduces token usage, and improves tool selection accuracy in real deployments. Learn more in the [tool search guide](https://developers.openai.com/api/docs/guides/tools-tool-search).

158- **1M token context window:** GPT-5.4 supports up to a 1M token context window, making it easier to analyze entire codebases, long document collections, or extended agent trajectories in a single request. Read more in the [1M context window](#1m-context-window) section.

159- **Built-in computer use:** GPT-5.4 is the first mainline model with built-in computer-use capabilities, enabling agents to interact directly with software to complete, verify, and fix tasks in a build-run-verify-fix loop. Learn more in the [computer use guide](https://developers.openai.com/api/docs/guides/tools-computer-use).

160- **Native compaction support:** GPT-5.4 is the first mainline model trained to support compaction, enabling longer agent trajectories while preserving key context.

~~161~~

162## Meet the models

~~163~~

164In general, `gpt-5.4` is the default model for your most important work across both general-purpose tasks and coding. It replaces the previous `gpt-5.2` model in the API, and `gpt-5.3-codex` in Codex. The model powering ChatGPT is `gpt-5-chat-latest`. For more difficult problems, `gpt-5.4-pro` uses more compute to think longer and provide consistently better answers.

~~165~~

166For smaller, faster variants, start with `gpt-5.4-mini` or `gpt-5.4-nano`.

~~167~~

168To help you pick the model that best fits your use case, consider these tradeoffs:

~~169~~

170| Variant | Best for |

171| ----------------------------------------------- | -------------------------------------------------------------------------------------------------------------------- |

172| [`gpt-5.4`](https://developers.openai.com/api/docs/models/gpt-5.4) | General-purpose work, including complex reasoning, broad world knowledge, and code-heavy or multi-step agentic tasks |

173| [`gpt-5.4-pro`](https://developers.openai.com/api/docs/models/gpt-5.4-pro) | Tough problems that may take longer to solve and need deeper reasoning |

174| [`gpt-5.4-mini`](https://developers.openai.com/api/docs/models/gpt-5.4-mini) | High-volume coding, computer use, and agent workflows that still need strong reasoning |

175| [`gpt-5.4-nano`](https://developers.openai.com/api/docs/models/gpt-5.4-nano) | Simple high-throughput tasks where speed and cost matter most |

~~176~~

177### Lower reasoning effort

~~178~~

179The `reasoning.effort` parameter controls how many reasoning tokens the model generates before producing a response. Earlier reasoning models like o3 supported only `low`, `medium`, and `high`: `low` favored speed and fewer tokens, while `high` favored more thorough reasoning.

~~180~~

181Starting with GPT-5.2, the lowest setting is `none` to provide lower-latency interactions. This is the default setting in GPT-5.2 and newer models. If you need more thinking, slowly increase to `medium` and experiment with results.

~~182~~

183With reasoning effort set to `none`, prompting is important. To improve the model's reasoning quality, even with the default settings, encourage it to “think” or outline its steps before answering.

~~184~~

185Reasoning effort set to none

~~186~~

187```bash

188curl --request POST \

189 --url https://api.openai.com/v1/responses \

190 --header "Authorization: Bearer $OPENAI_API_KEY" \

191 --header 'Content-type: application/json' \

192 --data '{

193 "model": "gpt-5.4",

194 "input": "How much gold would it take to coat the Statue of Liberty in a 1mm layer?",

195 "reasoning": {

196 "effort": "none"

197 }

198}'

199```

~~200~~

201```javascript

202import OpenAI from "openai";

203const openai = new OpenAI();

~~204~~

205const response = await openai.responses.create({

206 model: "gpt-5.4",

207 input: "How much gold would it take to coat the Statue of Liberty in a 1mm layer?",

208 reasoning: {

209 effort: "none"

210 }

211});

~~212~~

213console.log(response);

214```

~~215~~

216```python

217from openai import OpenAI

218client = OpenAI()

~~219~~

220response = client.responses.create(

221 model="gpt-5.4",

222 input="How much gold would it take to coat the Statue of Liberty in a 1mm layer?",

223 reasoning={

224 "effort": "none"

225 }

226)

~~227~~

228print(response)

229```

~~230~~

~~231~~

232### Verbosity

~~233~~

234Verbosity determines how many output tokens are generated. Lowering the number of tokens reduces overall latency. While the model's reasoning approach stays mostly the same, the model finds ways to answer more concisely—which can either improve or diminish answer quality, depending on your use case. Here are some scenarios for both ends of the verbosity spectrum:

~~235~~

236- **High verbosity:** Use when you need the model to provide thorough explanations of documents or perform extensive code refactoring.

237- **Low verbosity:** Best for situations where you want concise answers or simple code generation, such as SQL queries.

~~238~~

239GPT-5 made this option configurable as one of `high`, `medium`, or `low`. With GPT-5.4, verbosity remains configurable and defaults to `medium`.

~~240~~

241When generating code with GPT-5.4, `medium` and `high` verbosity levels yield longer, more structured code with inline explanations, while `low` verbosity produces shorter, more concise code with minimal commentary.

~~242~~

243Control verbosity

~~244~~

245```bash

246curl --request POST \

247 --url https://api.openai.com/v1/responses \

248 --header "Authorization: Bearer $OPENAI_API_KEY" \

249 --header 'Content-type: application/json' \

250 --data '{

251 "model": "gpt-5.4",

252 "input": "What is the answer to the ultimate question of life, the universe, and everything?",

253 "text": {

254 "verbosity": "low"

255 }

256}'

257```

~~258~~

259```javascript

260import OpenAI from "openai";

261const openai = new OpenAI();

~~262~~

263const response = await openai.responses.create({

264 model: "gpt-5.4",

265 input: "What is the answer to the ultimate question of life, the universe, and everything?",

266 text: {

267 verbosity: "low"

268 }

269});

~~270~~

271console.log(response);

272```

~~273~~

274```python

275from openai import OpenAI

276client = OpenAI()

~~277~~

278response = client.responses.create(

279 model="gpt-5.4",

280 input="What is the answer to the ultimate question of life, the universe, and everything?",

281 text={

282 "verbosity": "low"

283 }

284)

~~285~~

286print(response)

287```

~~288~~

~~289~~

290You can still steer verbosity through prompting after setting it to `low` in the API. The verbosity parameter defines a general token range at the system prompt level, but the actual output is flexible to both developer and user prompts within that range.

~~291~~

292#### 1M context window

~~293~~

2941M token context window was introduced with GPT-5.4, making it easier to analyze entire codebases, long document collections, or extended agent trajectories in a single request.

~~295~~

296We have separate standard pricing for requests under 272K and over 272K tokens, available in the [pricing docs](https://developers.openai.com/api/docs/pricing). If you use [priority processing](https://developers.openai.com/api/docs/guides/priority-processing), any prompt above 272K tokens is automatically processed at standard rates.

~~297~~

298Long context pricing stacks with other pricing modifiers such as data residency and batch.

~~299~~

300We have different rate limits for requests under 272K tokens and over 272K tokens; this is available on the [GPT-5.4 model page](https://developers.openai.com/api/docs/models/gpt-5.4).

~~301~~

302## Using tools with GPT-5.4

~~303~~

304GPT-5.4 has been post-trained on specific tools. See the [tools docs](https://developers.openai.com/api/docs/guides/tools) for more specific guidance.

~~305~~

306### Computer use tool

~~307~~

308Computer use lets GPT-5.4 operate software through the user interface by inspecting screenshots and returning structured actions for your harness to execute. It is a good fit for browser or desktop workflows where a person could complete the task through the UI, such as navigating a site, filling out forms, or validating that a change actually worked.

~~309~~

310Use it in an isolated browser or VM, and keep a human in the loop for high-impact actions. The full guide covers the built-in Responses API loop, custom harness patterns, and code-execution-based setups.

~~311~~

312[

~~313~~

314

315 

316 Learn how to run the built-in computer tool safely and integrate it with

317 your own harness.

~~318~~

319](https://developers.openai.com/api/docs/guides/tools-computer-use)

~~320~~

321### Tool search tool

~~322~~

323Tool search lets GPT-5.4 defer large tool surfaces until runtime so the model loads only the definitions it needs. This is most useful when you have many functions, namespaces, or MCP tools and want to reduce token usage, preserve cache performance, and improve latency without exposing every schema up front.

~~324~~

325Use hosted tool search when the candidate tools are already known at request time, or client-executed tool search when your application needs to decide what to load dynamically. The full guide also covers best practices for namespaces, MCP servers, and deferred loading.

~~326~~

327[

~~328~~

329

330 

331 Learn how to defer tool definitions and load the right subset at runtime.

~~332~~

333](https://developers.openai.com/api/docs/guides/tools-tool-search)

~~334~~

335### Custom tools

~~336~~

337When the GPT-5 model family launched, we introduced a new capability called custom tools, which lets models send any raw text as tool call input but still constrain outputs if desired. This tool behavior remains true in GPT-5.4.

338 38

339[395. **Coding workflows need stronger orchestration:** GPT-5.5 is better suited to complex coding tasks that require planning, tool use, codebase navigation, verification, and multi-step execution. For coding agents, be explicit about reuse, subagent delegation, test expectations, acceptance criteria, and when to continue versus ask for help.

340 40

34141## Migration quickstart

342 

343 Learn about custom tools in the function calling guide.

344 42

345](https://developers.openai.com/api/docs/guides/function-calling)43### Automated migration with Codex

346 44

347#### Freeform inputs45Codex can apply the recommended changes in this guide with the [OpenAI Docs Skill](https://github.com/openai/skills/tree/main/skills/.curated/openai-docs).

~~348~~

349Define your tool with `type: custom` to enable models to send plaintext inputs directly to your tools, rather than being limited to structured JSON. The model can send any raw text—code, SQL queries, shell commands, configuration files, or long-form prose—directly to your tool.

~~350~~

351```bash

352{

353 "type": "custom",

354 "name": "code_exec",

355 "description": "Executes arbitrary python code",

356}

357 46

47```text

48$openai-docs migrate this project to gpt-5.5

358```49```

359 50

360#### Constraining outputs51To use this skill in other coding agents, download it from the [OpenAI skills repository](https://github.com/openai/skills/tree/main/skills/.curated/openai-docs).

~~361~~

362GPT-5.4 supports context-free grammars (CFGs) for custom tools, letting you provide a Lark grammar to constrain outputs to a specific syntax or DSL. Attaching a CFG (e.g., a SQL or DSL grammar) ensures the assistant's text matches your grammar.

~~363~~

364This enables precise, constrained tool calls or structured responses and lets you enforce strict syntactic or domain-specific formats directly in GPT-5.4's function calling, improving control and reliability for complex or constrained domains.

~~365~~

366#### Best practices for custom tools

~~367~~

368- **Write concise, explicit tool descriptions**. The model chooses what to send based on your description; state clearly if you want it to always call the tool.

369- **Validate outputs on the server side**. Freeform strings are powerful but require safeguards against injection or unsafe commands.

~~370~~

371### Allowed tools

~~372~~

373The `allowed_tools` parameter under `tool_choice` lets you pass N tool definitions but restrict the model to only M (< N) of them. List your full toolkit in `tools`, and then use an `allowed_tools` block to name the subset and specify a mode—either `auto` (the model may pick any of those) or `required` (the model must invoke one).

~~374~~

375[

~~376~~

377

378 

379 Learn about the allowed tools option in the function calling guide.

~~380~~

381](https://developers.openai.com/api/docs/guides/function-calling)

~~382~~

383By separating all possible tools from the subset that can be used _now_, you gain greater safety, predictability, and improved prompt caching. You also avoid brittle prompt engineering, such as hard-coded call order. GPT-5.4 dynamically invokes or requires specific functions mid-conversation while reducing the risk of unintended tool usage over long contexts.

~~384~~

385| | **Standard Tools** | **Allowed Tools** |

386| ---------------- | ----------------------------------------- | ------------------------------------------------------------- |

387| Model's universe | All tools listed under **`"tools": […]`** | Only the subset under **`"tools": […]`** in **`tool_choice`** |

388| Tool invocation | Model may or may not call any tool | Model restricted to (or required to call) chosen tools |

389| Purpose | Declare available capabilities | Constrain which capabilities are actually used |

~~390~~

391```bash

392 "tool_choice": {

393 "type": "allowed_tools",

394 "mode": "auto",

395 "tools": [

396 { "type": "function", "name": "get_weather" },

397 { "type": "function", "name": "search_docs" }

398 ]

399 }

400}'

401```

~~402~~

403For a more detailed overview of all of these new features, see the [prompt guidance for GPT-5.4](https://developers.openai.com/api/docs/guides/prompt-guidance).

~~404~~

405### Preambles

~~406~~

407Preambles are brief, user-visible explanations that GPT-5.4 generates before invoking any tool or function, outlining its intent or plan (e.g., “why I'm calling this tool”). They appear after the chain-of-thought and before the actual tool call, providing transparency into the model's reasoning and enhancing debuggability, user confidence, and fine-grained steerability.

~~408~~

409By letting GPT-5.4 “think out loud” before each tool call, preambles boost tool-calling accuracy (and overall task success) without bloating reasoning overhead. To enable preambles, add a system or developer instruction—for example: “Before you call a tool, explain why you are calling it.” GPT-5.4 prepends a concise rationale to each specified tool call. The model may also output multiple messages between tool calls, which can enhance the interaction experience—particularly for minimal reasoning or latency-sensitive use cases.

~~410~~

411For more on using preambles, see the [GPT-5 prompting cookbook](https://developers.openai.com/cookbook/examples/gpt-5/gpt-5_prompting_guide#tool-preambles).

~~412~~

413## Migration guidance

~~414~~

415GPT-5.4 is our best model yet, and it works best with the Responses API, which supports passing chain of thought (CoT) between turns to improve performance. Read below to migrate from your current model or API.

~~416~~

417### Migrating from other models to GPT-5.4

~~418~~

419Use the [OpenAI Docs

420 skill](https://github.com/openai/skills/tree/main/skills/.system/openai-docs)

421 when migrating existing prompts or workflows to GPT-5.4. It's available in our

422 public skills repository and the Codex desktop app.

~~423~~

424While the model should be close to a drop-in replacement for GPT-5.2, there are a few key changes to call out. See [Prompt guidance for GPT-5.4](https://developers.openai.com/api/docs/guides/prompt-guidance) for specific updates to make in your prompts.

~~425~~

426Using GPT-5 models with the Responses API provides improved intelligence because of the API's design. The Responses API can pass the previous turn's CoT to the model. This leads to fewer generated reasoning tokens, higher cache hit rates, and less latency. To learn more, see an [in-depth guide](https://developers.openai.com/cookbook/examples/responses_api/reasoning_items) on the benefits of the Responses API.

~~427~~

428When migrating to GPT-5.4 from an older OpenAI model, start by experimenting with reasoning levels and prompting strategies. Based on our testing, we recommend using our [prompt optimizer](https://platform.openai.com/chat/edit?models=gpt-5.4&optimize=true)—which automatically updates your prompts for GPT-5.4 based on our best practices—and following this model-specific guidance:

~~429~~

430- **gpt-5.2**: `gpt-5.4` with default settings is meant to be a drop-in replacement.

431- **o3**: `gpt-5.4` with `medium` or `high` reasoning. Start with `medium` reasoning with prompt tuning, then increase to `high` if you aren't getting the results you want.

432- **gpt-4.1**: `gpt-5.4` with `none` reasoning. Start with `none` and tune your prompts; increase if you need better performance.

433- **o4-mini or gpt-4.1-mini**: `gpt-5.4-mini` with prompt tuning is a great replacement.

434- **gpt-4.1-nano**: `gpt-5.4-nano` with prompt tuning is a great replacement.

~~435~~

436### New `phase` parameter

~~437~~

438For long-running or tool-heavy GPT-5.4 flows in the Responses API, use the assistant message `phase` field to avoid early stopping and other misbehavior.

~~439~~

440`phase` is optional at the API level, but we highly recommend using it. Use `phase: "commentary"` for intermediate assistant updates (such as preambles before tool calls) and `phase: "final_answer"` for the completed answer. Do not add `phase` to user messages.

~~441~~

442If you use `previous_response_id`, that is usually the simplest path because

443 prior assistant state is preserved. If you replay assistant history manually,

444 preserve each original `phase` value.

~~445~~

446Missing or dropped `phase` can cause preambles to be treated as final answers

447in those workflows. For additional guidance and examples, see the [GPT-5.4

448prompting guide](https://developers.openai.com/api/docs/guides/prompt-guidance#phase-parameter).

~~449~~

450Round-trip assistant phase values

~~451~~

452```javascript

453import OpenAI from "openai";

454const client = new OpenAI();

~~455~~

456const response = await client.responses.create({

457 model: "gpt-5.4",

458 input: [

459 {

460 role: "assistant",

461 phase: "commentary",

462 content:

463 "I’ll inspect the logs and then summarize root cause and remediation.",

464 },

465 {

466 role: "assistant",

467 phase: "final_answer",

468 content: "Root cause: cache invalidation race.",

469 },

470 {

471 role: "user",

472 content: "Great—now give me a rollout-safe fix plan.",

473 },

474 ],

475});

~~476~~

477console.log(response.output_text);

478```

~~479~~

480```python

481from openai import OpenAI

~~482~~

483client = OpenAI()

~~484~~

485response = client.responses.create(

486 model="gpt-5.4",

487 input=[

488 {

489 "role": "assistant",

490 "phase": "commentary",

491 "content": "I’ll inspect the logs and then summarize root cause and remediation.",

492 },

493 {

494 "role": "assistant",

495 "phase": "final_answer",

496 "content": "Root cause: cache invalidation race.",

497 },

498 {

499 "role": "user",

500 "content": "Great—now give me a rollout-safe fix plan.",

501 },

502 ],

503)

~~504~~

505print(response.output_text)

506```

~~507~~

~~508~~

509### GPT-5.4 parameter compatibility

~~510~~

511The following parameters are **only supported** when using GPT-5.4 with reasoning effort set to `none`:

~~512~~

513- `temperature`

514- `top_p`

515- `logprobs`

~~516~~

517Requests to GPT-5.4 or GPT-5.2 with any other reasoning effort setting, or to older GPT-5 models (e.g., `gpt-5`, `gpt-5-mini`, `gpt-5-nano`) that include these fields will raise an error.

~~518~~

519To achieve similar results with reasoning effort set higher, or with another GPT-5 family model, try these alternative parameters:

~~520~~

521- **Reasoning depth:** `reasoning: { effort: "none" | "low" | "medium" | "high" | "xhigh" }`

522- **Output verbosity:** `text: { verbosity: "low" | "medium" | "high" }`

523- **Output length:** `max_output_tokens`

~~524~~

525### Migrating from Chat Completions to Responses API

~~526~~

527The biggest difference, and main reason to migrate from Chat Completions to the Responses API for GPT-5.4, is support for passing chain of thought (CoT) between turns. See a full [comparison of the APIs](https://developers.openai.com/api/docs/guides/responses-vs-chat-completions).

~~528~~

529Passing CoT exists only in the Responses API, and we've seen improved intelligence, fewer generated reasoning tokens, higher cache hit rates, and lower latency as a result of doing so. Most other parameters remain at parity, though the formatting is different. Here's how new parameters are handled differently between Chat Completions and the Responses API:

~~530~~

531**Reasoning effort**

~~532~~

~~533~~

~~534~~

535<div data-content-switcher-pane data-value="responses">

536 <div class="hidden">Responses API</div>

537 </div>

538 <div data-content-switcher-pane data-value="chat" hidden>

539 <div class="hidden">Chat Completions</div>

540 </div>

~~541~~

~~542~~

~~543~~

544**Verbosity**

~~545~~

~~546~~

~~547~~

548<div data-content-switcher-pane data-value="responses">

549 <div class="hidden">Responses API</div>

550 </div>

551 <div data-content-switcher-pane data-value="chat" hidden>

552 <div class="hidden">Chat Completions</div>

553 </div>

~~554~~

~~555~~

~~556~~

557**Custom tools**

~~558~~

~~559~~

~~560~~

561<div data-content-switcher-pane data-value="responses">

562 <div class="hidden">Responses API</div>

563 </div>

564 <div data-content-switcher-pane data-value="chat" hidden>

565 <div class="hidden">Chat Completions</div>

566 </div>

~~567~~

~~568~~

~~569~~

570## Prompting guidance

~~571~~

572We specifically designed GPT-5.4 to excel at coding and agentic tasks. We also recommend iterating on prompts for GPT-5.4 with the prompt optimizer.

~~573~~

574<div className="mt-4 flex flex-col gap-2">

575 [

~~576~~

577

578 

579 Craft the perfect prompt for GPT-5.4 in the dashboard

~~580~~

581](https://platform.openai.com/chat/edit?optimize=true)

~~582~~

583[

~~584~~

585

586 

587 Learn prompt patterns and migration tips for GPT-5.4

~~588~~

589](https://developers.openai.com/api/docs/guides/prompt-guidance)

~~590~~

591 <a href="https://cookbook.openai.com/examples/gpt-5/gpt-5_frontend">

~~592~~

~~593~~

594

595 

596 See prompt samples specific to frontend development for GPT-5 family of

597 models

~~598~~

~~599~~

600 </a>

601</div>

~~602~~

603### GPT-5.4 is a reasoning model

~~604~~

605Reasoning models like GPT-5.4 break problems down step by step, producing an internal chain of thought that encodes their reasoning. To maximize performance, pass these reasoning items back to the model: this avoids re-reasoning and keeps interactions closer to the model's training distribution. In multi-turn conversations, passing a `previous_response_id` automatically makes earlier reasoning items available. This is especially important when using tools—for example, when a function call requires an extra round trip. In these cases, either include them with `previous_response_id` or add them directly to `input`.

~~606~~

607Learn more about reasoning models and how to get the most out of them in our [reasoning guide](https://developers.openai.com/api/docs/guides/reasoning).

~~608~~

609## Further reading

~~610~~

611[GPT-5.4 prompting guide](https://developers.openai.com/api/docs/guides/prompt-guidance)

~~612~~

613[GPT-5.3-Codex prompting guide](https://developers.openai.com/cookbook/examples/gpt-5/codex_prompting_guide)

~~614~~

615[GPT-5.4 blog post](https://openai.com/index/introducing-gpt-5-4/)

~~616~~

617[GPT-5 frontend guide](https://developers.openai.com/cookbook/examples/gpt-5/gpt-5_frontend)

~~618~~

619[GPT-5 model family: new features guide](https://developers.openai.com/cookbook/examples/gpt-5/gpt-5_new_params_and_tools)

~~620~~

621[Cookbook on reasoning models](https://developers.openai.com/cookbook/examples/responses_api/reasoning_items)

~~622~~

623[Comparison of Responses API vs. Chat Completions](https://developers.openai.com/api/docs/guides/migrate-to-responses)

~~624~~

625## FAQ

~~626~~

6271. **How are these models integrated into ChatGPT?**

~~628~~

629 In ChatGPT, there are three models: GPT‑5 Instant, GPT‑5 Thinking, and GPT‑5 Pro. Based on the user's question, a routing layer selects the best model to use. Users can also invoke reasoning directly through the ChatGPT UI.

630 52

631 All three ChatGPT models (Instant, Thinking, and Pro) have a new knowledge cutoff of August 2025. For users, this means GPT-5.4 starts with a more current understanding of the world, so answers are more accurate and useful, with more relevant examples and context, even before turning to web search.53### API and model parameters

632 54

6331. **Will these models be supported in Codex?**55- Update the model slug to `gpt-5.5`.

56- Use the Responses API for any reasoning, tool-calling, or multi-turn use case.

57- Tune `reasoning.effort`. Use `low` for efficient reasoning, `medium` for a balanced point on the latency/performance curve, `high` for complex agentic tasks that require hard reasoning and where latency matters less, and `xhigh` for the hardest asynchronous agentic tasks or evals that test the bounds of model intelligence. See the [Reasoning models documentation](https://developers.openai.com/api/docs/guides/reasoning).

58- To configure for more concise responses, set `text.verbosity` to `low`. On GPT-5.5, this will result in proportionally more concise responses than `low` verbosity with GPT-5.4.

59- For tool-heavy or long-running workflows, verify that your application handles `phase`, preambles, and assistant-item replay correctly.

60- Benchmark against other models on accuracy, token consumption, and end-to-end latency.

634 61

635 Yes, `gpt-5.4` is the newest model that powers Codex and Codex CLI. You can also use this as a standalone model for building agentic coding applications.62### Prompting

636 63

6371. **How does GPT-5.4 compare to GPT-5.3-Codex?**64- State the expected outcome and success criteria.

65- Reduce or remove detailed step-by-step process guidance. Let GPT-5.5 choose the path unless the product requires that path.

66- Remove output schema definitions from the prompt where possible. Use [Structured Outputs](https://developers.openai.com/api/docs/guides/structured-outputs) instead.

67- Optimize your prompt for caching: [static parts first, dynamic parts last](https://developers.openai.com/api/docs/guides/prompt-caching).

68- Drop the current date. The model is already aware of the current UTC date.

69- Review and optimize your prompts based on the guidance in [Prompting GPT-5.5](https://developers.openai.com/api/docs/guides/prompt-guidance?model=gpt-5.5).

638 70

639 [GPT-5.3-Codex](https://developers.openai.com/api/docs/models/gpt-5.3-codex) is specifically designed for use in coding environments such as Codex. GPT-5.4 is designed for both general-purpose work and coding, making it the better default when your workflow spans software engineering plus planning, writing, or other business tasks. GPT-5.3-Codex is only available in the Responses API and supports `low`, `medium`, `high`, and `xhigh` reasoning effort settings along with function calling, structured outputs, streaming, and prompt caching. It doesn't support all GPT-5.4 parameters or API surfaces.71## Using reasoning models

640 72

6411. **What is the deprecation plan for previous models?**73This guidance applies to GPT-5 series models and is worth revisiting whenever teams move workloads onto reasoning models. GPT-5.5 carries forward many capabilities that first appeared in earlier models, but they're still worth reviewing if you are moving from an earlier GPT-5 model, GPT-4.1, or a reasoning model such as o3.

642 74

643 Any model deprecations will be posted on our [deprecations page](https://developers.openai.com/api/docs/deprecations#page-top). We'll send advanced notice of any model deprecations.75Teams can overlook these features because they sit partly in API configuration and orchestration rather than in the prompt itself. Used together, the Responses API, reasoning controls, verbosity, structured outputs, prompt caching, tool design, hosted tools, and state management help reasoning models deliver their best intelligence, reliability, latency, and cost profile.

644 76

6451. **What are the reasoning efforts supported?**

646 - GPT 5 supports minimal, low, medium (default), and high.

647 - GPT 5.2 supports none (default), low, medium, and high.

648 - GPT 5.4 supports none (default), low, medium, high, and xhigh.

77- **Responses API:** GPT-5.5 works best in the [Responses API](https://developers.openai.com/api/docs/guides/migrate-to-responses). Use `previous_response_id` for multi-turn state handling. For stateless or Zero Data Retention flows, pass back the relevant returned output items each turn. See [Passing context from the previous response](https://developers.openai.com/api/docs/guides/conversation-state#passing-context-from-the-previous-response) for details.

78- **Reasoning effort:** Use `reasoning.effort` to choose between `low`, `medium`, `high`, or `xhigh`. The default is `medium`, but many workloads will perform well with `low`. Reserve `none` for use cases where low latency is more important than intelligence. See [Reasoning Models](https://developers.openai.com/api/docs/guides/reasoning) for detailed recommendations.

79- **Verbosity:** Use `text.verbosity` to control output length. Treat final answer length as separate from reasoning quality; specify word budgets, section counts, table widths, or JSON-only output where needed.

80- **Structured Outputs:** Avoid describing the expected output schema in the prompt. Use [Structured Outputs](https://developers.openai.com/api/docs/guides/structured-outputs) for automatic validation and increased accuracy.

81- **Prompt caching:** [Prompt caching](https://developers.openai.com/api/docs/guides/prompt-caching) works automatically for eligible long prompts and can reduce latency and input-token cost. To maximize cache hits, keep stable content at the beginning of the request. Put dynamic user-specific context near the end. For repeated traffic with common prefixes, use `prompt_cache_key` consistently and track `usage.prompt_tokens_details.cached_tokens`.

82- **Tool calling:** GPT-5.5 supports the same tool-calling patterns as GPT-5.4, including function tools and tool-heavy agent workflows. Put most tool-specific guidance in the tool descriptions themselves: what the tool does, when to use it, required inputs, side effects, retry safety, and common error modes. Add tool-specific context to system instructions only when it applies across tools or materially changes the agent's operating policy.

83- **Hosted tools and tool search:** Prefer [OpenAI-hosted tools](https://developers.openai.com/api/docs/guides/tools) where they fit the workflow, such as web search, file search, code interpreter, image generation, and computer use. Hosted tools reduce custom orchestration burden and keep common tool patterns aligned with the Responses API and Agents SDK. Use custom function tools when you need to call your own systems, enforce domain-specific side effects, or expose internal business workflows. For large tool catalogs, consider using [tool search](https://developers.openai.com/api/docs/guides/tools-tool-search) to defer tool definitions and load only the relevant subset.

84- **Tool preambles:** Preambles can improve chat UX because the user sees an initial, useful status update before the model generates the final response. They also make tool use easier to follow: the model can state what it's about to check or do, then continue from that same assistant state after tool results arrive.

85- **`phase` handling:** If your application manually manages Responses state by passing output items back each turn instead of using `previous_response_id`, preserve the `phase` parameter on returned assistant output items and pass it back unchanged. This is especially important when using reasoning effort, preambles, or repeated tool calls. See [Phase parameter](https://developers.openai.com/api/docs/guides/reasoning#phase-parameter).

86- **Compaction:** For long-running agents, use [conversation/state compaction](https://developers.openai.com/api/docs/guides/compaction) intentionally. Preserve completed actions, active assumptions, IDs, tool outcomes, unresolved blockers, and the next concrete goal.

87- **Agents SDK:** For new agentic systems, use the latest [Agents SDK](https://developers.openai.com/api/docs/guides/agents) patterns for tool orchestration, tracing, handoffs, and state management rather than rebuilding orchestration from scratch.

88- **Current date:** GPT-5.5 is aware of the current date in UTC. You don't need to add the current date to system instructions. Add explicit date or timezone context only when the application needs a business-specific timezone, policy-effective date, user-local date, or other non-UTC reference point.

guides/migrate-to-responses.md +3 −3

Details

857 857

858```javascript858```javascript

859const answer = await client.responses.create({859const answer = await client.responses.create({

860 model: 'gpt-5.4',860 model: 'gpt-5.5',

861 input: 'Who is the current president of France?',861 input: 'Who is the current president of France?',

862 tools: [{ type: 'web_search' }]862 tools: [{ type: 'web_search' }]

863});863});

867 867

868```python868```python

869answer = client.responses.create(869answer = client.responses.create(

870 model="gpt-5.4",870 model="gpt-5.5",

871 input="Who is the current president of France?",871 input="Who is the current president of France?",

872 tools=[{"type": "web_search"}]872 tools=[{"type": "web_search"}]

873)873)

880 -H "Content-Type: application/json" \\880 -H "Content-Type: application/json" \\

881 -H "Authorization: Bearer $OPENAI_API_KEY" \\881 -H "Authorization: Bearer $OPENAI_API_KEY" \\

882 -d '{882 -d '{

883 "model": "gpt-5.4",883 "model": "gpt-5.5",

884 "input": "Who is the current president of France?",884 "input": "Who is the current president of France?",

885 "tools": [{"type": "web_search"}]885 "tools": [{"type": "web_search"}]

886 }'886 }'

guides/production-best-practices.md +1 −1

Details

87#### Model87#### Model

88 88

89Our API offers different models with varying levels of complexity and generality. The most capable models, such as `gpt-5`, can generate more complex and diverse completions, but they also take longer to process your query.89Our API offers different models with varying levels of complexity and generality. The most capable models, such as `gpt-5`, can generate more complex and diverse completions, but they also take longer to process your query.

90Models such as `gpt-5.4-mini` and `gpt-5.4-nano` can generate faster and cheaper Responses, while `gpt-5.4` is a stronger default when you want more headroom on complex tasks. You can choose the model that best suits your use case and the trade-off between speed, cost, and quality.90Models such as `gpt-5.4-mini` and `gpt-5.4-nano` can generate faster and cheaper Responses, while `gpt-5.5` is a stronger default when you want more headroom on complex tasks. You can choose the model that best suits your use case and the trade-off between speed, cost, and quality.

91 91

92#### Number of completion tokens92#### Number of completion tokens

93 93

guides/prompt-caching.md +8 −6

Details

34 34

35### In-memory prompt cache retention35### In-memory prompt cache retention

36 36

~~37In-memory prompt cache retention is available for all models that support Prompt Caching.~~37In-memory prompt cache retention is available for all models that support Prompt Caching, except for `gpt-5.5`, `gpt-5.5-pro`, and all future models.

38 38

39When using the in-memory policy, cached prefixes generally remain active for 5 to 10 minutes of inactivity, up to a maximum of one hour. In-memory cached prefixes are only held within volatile GPU memory.39When using the in-memory policy, cached prefixes generally remain active for 5 to 10 minutes of inactivity, up to a maximum of one hour. In-memory cached prefixes are only held within volatile GPU memory.

40 40

42 42

43Extended prompt cache retention is available for the following models:43Extended prompt cache retention is available for the following models:

44 44

45- gpt-5.5

46- gpt-5.5-pro

45- gpt-5.447- gpt-5.4

46- gpt-5.248- gpt-5.2

~~47- gp5-5.1-codex-max~~49- gpt-5.1-codex-max

48- gpt-5.150- gpt-5.1

49- gpt-5.1-codex51- gpt-5.1-codex

50- gpt-5.1-codex-mini52- gpt-5.1-codex-mini

59 61

60### Configure per request62### Configure per request

61 63

~~62If you don’t specify a retention policy, the default is `in_memory`. Allowed values are `in_memory` and `24h`.~~64If you don’t specify a retention policy, for most models the default is `in_memory`. For `gpt-5.5`, `gpt-5.5-pro`, and all future models, the default is `24h` and `in_memory` is not supported. Allowed values are `in_memory` and `24h`.

63 65

64```json66```json

65{67{

~~66 "model": "gpt-5.1",~~68 "model": "gpt-5.5",

67 "input": "Your prompt goes here...",69 "input": "Your prompt goes here...",

68 "prompt_cache_retention": "24h"70 "prompt_cache_retention": "24h"

69}71}

109 111

1101. **How is data privacy maintained for caches?**1121. **How is data privacy maintained for caches?**

111 113

112 Prompt caches are not shared between organizations. Only members of the same organization can access caches of identical prompts.114 Prompt caches are not shared between organizations. Only members of the same organization can access caches of identical prompts. When using Extended Prompt Caching, key/value tensors have a maximum retention period of 24 hours.

113 115

1142. **Does Prompt Caching affect output token generation or the final response of the API?**1162. **Does Prompt Caching affect output token generation or the final response of the API?**

115 117

136 138

1377. **Does Prompt Caching work with Data Residency?**1397. **Does Prompt Caching work with Data Residency?**

138 140

139 In-memory Prompt Caching is compatable with all Data Residency regions.141 In-memory Prompt Caching does not store data and so does not impact Data Residency.

140 142

141 Extended caching temporarily stores data on GPU machines and will only be kept in-region when using Regional Inference.143 Extended caching temporarily stores data on GPU machines and will only be kept in-region when using Regional Inference.

guides/prompt-guidance.md +154 −489

Details

~~1# Prompt guidance for GPT-5.4~~1# GPT-5.5 prompting guide

2 2

3GPT-5.4, our newest mainline model, is designed to balance long-running task performance, stronger control over style and behavior, and more disciplined execution across complex workflows. Building on advances from GPT-5 through GPT-5.3-Codex, GPT-5.4 improves token efficiency, sustains multi-step workflows more reliably, and performs well on long-horizon tasks.3Prompt GPT-5.5 with outcome-first goals, concise style controls, retrieval budgets, and validation loops.

4 4

5GPT-5.4 is designed for production-grade assistants and agents that need strong multi-step reasoning, evidence-rich synthesis, and reliable performance over long contexts. It is especially effective when prompts clearly specify the output contract, tool-use expectations, and completion criteria. In practice, the biggest gains come from choosing the right reasoning effort for the task, using explicit grounding and citation rules, and giving the model a precise definition of what "done" looks like. This guide focuses on prompt patterns and migration practices that preserve those efficiency wins. For model capabilities, API parameters, and broader migration guidance, see [our latest model guide](https://developers.openai.com/api/docs/guides/latest-model).5## New in GPT-5.5 vs GPT-5.4

6- Shorter, outcome-first prompts usually work better than process-heavy prompt stacks.

7- More efficient reasoning means `low` and `medium` effort should be re-evaluated before escalating.

8- Preambles, `phase` handling, and assistant-item replay remain important for tool-heavy Responses workflows.

9- Explicit personality, retrieval budgets, and validation rules help shape customer-facing and agentic UX.

6 10

~~7When troubleshooting cases where GPT-5.4 treats an intermediate update as the~~11GPT-5.5 works best when prompts define the outcome and leave room for the model to choose an efficient solution path. Compared with earlier models, you can often use shorter, more outcome-oriented prompts: describe what good looks like, what constraints matter, what evidence is available, and what the final answer should contain.

~~8 final answer, verify your integration preserves the assistant message `phase`~~

~~9 field correctly. See [Phase parameter](#phase-parameter) for details.~~

10 12

~~11## Understand GPT-5.4 behavior~~13Avoid carrying over every instruction from an older prompt stack. Legacy prompts often over-specify the process because earlier models needed more help staying on track. With GPT-5.5, that can add noise, narrow the model's search space, or lead to overly mechanical answers.

12 14

~~13### Where GPT-5.4 is strongest~~15For more detail on GPT-5.5 behavior changes, start with the [Using GPT-5.5 guide](https://developers.openai.com/api/docs/guides/latest-model). This guide focuses on prompt changes that follow from those behavior changes.

14 16

~~15GPT-5.4 tends to work especially well in these areas:~~17The patterns here are starting points. Adapt them to your product surface, tools, evals, and user experience goals.

16 18

~~17- Strong personality and tone adherence, with less drift over long answers~~19## Automated migration with Codex

~~18- Agentic workflow robustness, with a stronger tendency to stick with multi-step work, retry, and complete agent loops end to end~~

~~19- Evidence-rich synthesis, especially in long-context or multi-tool workflows~~

~~20- Instruction adherence in modular, skill-based, and block-structured prompts when the contract is explicit~~

~~21- Long-context analysis across large, messy, or multi-document inputs~~

~~22- Batched or parallel tool calling while maintaining tool-call accuracy~~

~~23- Spreadsheet, finance, and Excel workflows that need instruction following, formatting fidelity, and stronger self-verification~~

24 20

~~25### Where explicit prompting still helps~~21Codex can implement the changes from this guide with the [OpenAI Docs Skill](https://github.com/openai/skills/tree/main/skills/.curated/openai-docs).

26 22

~~27Even with those strengths, GPT-5.4 benefits from more explicit guidance in a few recurring patterns:~~23```text

24$openai-docs migrate this project to gpt-5.5

25```

28 26

~~29- Low-context tool routing early in a session, when tool selection can be less reliable~~27To use this skill in other coding agents, download it from the [OpenAI skills repository](https://github.com/openai/skills/tree/main/skills/.curated/openai-docs).

~~30- Dependency-aware workflows that need explicit prerequisite and downstream-step checks~~

~~31- Reasoning effort selection, where higher effort is not always better and the right choice depends on task shape, not intuition~~

~~32- Research tasks that require disciplined source collection and consistent citations~~

~~33- Irreversible or high-impact actions that require verification before execution~~

~~34- Terminal or coding-agent environments where tool boundaries must stay clear~~

35 28

~~36These patterns are observed defaults, not guarantees. Start with the smallest prompt that passes your evals, and add blocks only when they fix a measured failure mode.~~29## Personality and behavior

37 30

~~38## Use core prompt patterns~~31GPT-5.5's default style is efficient, direct, and task-oriented. This is useful for production systems: responses stay focused, behavior is easier to steer, and the model avoids unnecessary conversational padding.

39 32

~~40### Keep outputs compact and structured~~33For customer-facing assistants, support workflows, coaching experiences, and other conversational products, define both personality and collaboration style.

41 34

42To improve token efficiency with GPT-5.4, constrain verbosity and enforce structured output through clear output contracts. In practice, this acts as an additional control layer alongside the `verbosity` parameter in the Responses API, allowing you to guide both how much the model writes and how it structures the output.35- **Personality** controls how the assistant sounds: tone, warmth, directness, formality, humor, empathy, and level of polish.

36- **Collaboration style** controls how the assistant works: when it asks questions, when it makes assumptions, how proactive it should be, how much context it gives, when it checks work, and how it handles uncertainty or risk.

43 37

~~44```xml~~38Keep both short. Personality instructions should shape the user experience. Collaboration instructions should shape task behavior. Neither should replace clear goals, success criteria, tool rules, or stopping conditions.

~~45<output_contract>~~

~~46- Return exactly the sections requested, in the requested order.~~

~~47- If the prompt defines a preamble, analysis block, or working section, do not treat it as extra output.~~

~~48- Apply length limits only to the section they are intended for.~~

~~49- If a format is required (JSON, Markdown, SQL, XML), output only that format.~~

~~50</output_contract>~~

51 39

~~52<verbosity_controls>~~40Example personality block for a steady task-focused assistant:

~~53- Prefer concise, information-dense writing.~~

~~54- Avoid repeating the user's request.~~

~~55- Keep progress updates brief.~~

~~56- Do not shorten the answer so aggressively that required evidence, reasoning, or completion checks are omitted.~~

~~57</verbosity_controls>~~

~~58```~~

59 41

~~60### Set clear defaults for follow-through~~42```text

43# Personality

44You are a capable collaborator: approachable, steady, and direct. Assume the user is competent and acting in good faith, and respond with patience, respect, and practical helpfulness.

61 45

62Users often change the task, format, or tone mid-conversation. To keep the assistant aligned, define clear rules for when to proceed, when to ask, and how newer instructions override earlier defaults.46Prefer making progress over stopping for clarification when the request is already clear enough to attempt. Use context and reasonable assumptions to move forward. Ask for clarification only when the missing information would materially change the answer or create meaningful risk, and keep any question narrow.

63 47

~~64Use a default follow-through policy like this:~~48Stay concise without becoming curt. Give enough context for the user to understand and trust the answer, then stop. Use examples, comparisons, or simple analogies when they make the point easier to grasp. When correcting the user or disagreeing, be candid but constructive. When an error is pointed out, acknowledge it plainly and focus on fixing it.

65 49

~~66```xml~~50Match the user's tone within professional bounds. Avoid emojis and profanity by default, unless the user explicitly asks for that style or has clearly established it as appropriate for the conversation.

~~67<default_follow_through_policy>~~

~~68- If the user’s intent is clear and the next step is reversible and low-risk, proceed without asking.~~

~~69- Ask permission only if the next step is:~~

~~70 (a) irreversible,~~

~~71 (b) has external side effects (for example sending, purchasing, deleting, or writing to production), or~~

~~72 (c) requires missing sensitive information or a choice that would materially change the outcome.~~

~~73- If proceeding, briefly state what you did and what remains optional.~~

~~74</default_follow_through_policy>~~

75```51```

76 52

~~77Make instruction priority explicit:~~53Example personality block for an expressive collaborative assistant:

55```text

56# Personality

57Adopt a vivid conversational presence: intelligent, curious, playful when appropriate, and attentive to the user's thinking. Ask good questions when the problem is blurry, then become decisive once there is enough context.

78 58

~~79```xml~~59Be warm, collaborative, and polished. Conversation should feel easy and alive, but not chatty for its own sake. Offer a real point of view rather than merely mirroring the user, while staying responsive to their goals and constraints.

~~80<instruction_priority>~~60

~~81- User instructions override default style, tone, formatting, and initiative preferences.~~61Be thoughtful and grounded when the task calls for synthesis or advice. State a clear recommendation when you have enough context, explain important tradeoffs, and name uncertainty without becoming evasive.

~~82- Safety, honesty, privacy, and permission constraints do not yield.~~

~~83- If a newer user instruction conflicts with an earlier one, follow the newer instruction.~~

~~84- Preserve earlier instructions that do not conflict.~~

~~85</instruction_priority>~~

86```62```

87 63

~~88Higher-priority developer or system instructions remain binding.~~64For more expressive products, add warmth, curiosity, humor, or point of view explicitly, but keep the block short. Use personality to shape the experience, not to compensate for unclear goals or missing task instructions.

89 65

90**Guidance:** When instructions change mid-conversation, make the update explicit, scoped, and local. State what changed, what still applies, and whether the change affects the next turn or the rest of the conversation.66## Improve time to first visible token with a preamble

91 67

~~92### Handle mid-conversation instruction updates~~68In streaming applications, users notice how long it takes before the first visible response appears. GPT-5.5 may spend time reasoning, planning, or preparing tool calls before emitting visible text.

93 69

~~94For mid-conversation updates, use explicit, scoped steering messages that state:~~70For longer or tool-heavy tasks, prompt the model to start with a short preamble: a brief visible update that acknowledges the request and states the first step. This can improve perceived responsiveness without changing the underlying task.

95 71

~~961. Scope~~72Use this pattern when the task may take more than one step, require tool calls, or involve a long-running agent workflow.

~~972. Override~~

~~983. Carry forward~~

99 73

100```text74```text

101<task_update>75Before any tool calls for a multi-step task, send a short user-visible update that acknowledges the request and states the first step. Keep it to one or two sentences.

102For the next response only:

103- Do not complete the task.

104- Only produce a plan.

105- Keep it to 5 bullets.

~~106~~

107All earlier instructions still apply unless they conflict with this update.

108</task_update>

109```76```

110 77

111If the task itself changes, say so directly:78For coding agents that expose separate message phases, you can be more explicit:

112 79

113```text80```text

114<task_update>81You must always start with an intermediary update before any content in the analysis channel if the task will require calling tools. The user update should acknowledge the request and explain your first step.

115The task has changed.

116Previous task: complete the workflow.

117Current task: review the workflow and identify risks only.

~~118~~

119Rules for this turn:

120- Do not execute actions.

121- Do not call destructive tools.

122- Return exactly:

123 1. Main risks

124 2. Missing information

125 3. Recommended next step

126</task_update>

127```82```

128 83

129### Make tool use persistent when correctness depends on it84## Outcome-first prompts and stopping conditions

~~130~~

131Use explicit rules to keep tool use thorough, dependency-aware, and appropriately paced, especially in workflows where later actions rely on earlier retrieval or verification. A common failure mode is skipping prerequisites because the right end state seems obvious.

132 85

133GPT-5.4 can be less reliable at tool routing early in a session, when context is still thin. Prompt for prerequisites, dependency checks, and exact tool intent.86GPT-5.5 is strongest when the prompt defines the target outcome, success criteria, constraints, and available context, then lets the model choose the path.

134 87

135```xml88For many tasks, describe the destination rather than every step. This gives the model room to choose the right search, tool, or reasoning strategy for the task.

136<tool_persistence_rules>

137- Use tools whenever they materially improve correctness, completeness, or grounding.

138- Do not stop early when another tool call is likely to materially improve correctness or completeness.

139- Keep calling tools until:

140 (1) the task is complete, and

141 (2) verification passes (see <verification_loop>).

142- If a tool returns empty or partial results, retry with a different strategy.

143</tool_persistence_rules>

144```

145 89

146This is especially important for workflows where the final action depends on earlier lookup or retrieval steps. One of the most common failure modes is skipping prerequisites because the intended end state seems obvious.90Prefer this:

~~147~~

148```xml

149<dependency_checks>

150- Before taking an action, check whether prerequisite discovery, lookup, or memory retrieval steps are required.

151- Do not skip prerequisite steps just because the intended final action seems obvious.

152- If the task depends on the output of a prior step, resolve that dependency first.

153</dependency_checks>

154```

155 91

156Prompt for parallelism when the work is independent and wall-clock matters. Prompt for sequencing when dependencies, ambiguity, or irreversible actions matter more than speed.92```text

93Resolve the customer's issue end to end.

157 94

158```xml95Success means:

159<parallel_tool_calling>96- the eligibility decision is made from the available policy and account data

160- When multiple retrieval or lookup steps are independent, prefer parallel tool calls to reduce wall-clock time.97- any allowed action is completed before responding

161- Do not parallelize steps that have prerequisite dependencies or where one result determines the next action.98- the final answer includes completed_actions, customer_message, and blockers

162- After parallel retrieval, pause to synthesize the results before making more calls.99- if evidence is missing, ask for the smallest missing field

163- Prefer selective parallelism: parallelize independent evidence gathering, not speculative or redundant tool use.

164</parallel_tool_calling>

165```100```

166 101

167### Force completeness on long-horizon tasks102**Avoid unnecessary absolute rules.** Older prompts often use strict instructions like `ALWAYS`, `NEVER`, `must`, and `only` to control model behavior. Use those words for true invariants, such as safety rules, required output fields, or actions that should never happen. For judgment calls, such as when to search, ask for clarification, use a tool, or keep iterating, prefer decision rules instead.

~~168~~

169For multi-step workflows, a common failure mode is incomplete execution: the model finishes after partial coverage, misses items in a batch, or treats empty or narrow retrieval as final. GPT-5.4 becomes more reliable when the prompt defines explicit completion rules and recovery behavior.

~~170~~

171Coverage can be achieved through sequential or parallel retrieval, but completion rules should remain explicit either way.

172 103

173```xml104Avoid this style of instruction unless every step is truly required:

174<completeness_contract>

175- Treat the task as incomplete until all requested items are covered or explicitly marked [blocked].

176- Keep an internal checklist of required deliverables.

177- For lists, batches, or paginated results:

178 - determine expected scope when possible,

179 - track processed items or pages,

180 - confirm coverage before finalizing.

181- If any item is blocked by missing data, mark it [blocked] and state exactly what is missing.

182</completeness_contract>

183```

184 105

185For workflows where empty, partial, or noisy retrieval is common:106```text

~~186~~ 107First inspect A, then inspect B, then compare every field, then think through

187```xml108all possible exceptions, then decide which tool to call, then call the tool,

188<empty_result_recovery>109then explain the entire process to the user.

189If a lookup returns empty, partial, or suspiciously narrow results:

190- do not immediately conclude that no results exist,

191- try at least one or two fallback strategies,

192 such as:

193 - alternate query wording,

194 - broader filters,

195 - a prerequisite lookup,

196 - or an alternate source or tool,

197- Only then report that no results were found, along with what you tried.

198</empty_result_recovery>

199```110```

200 111

201### Add a verification loop before high-impact actions112Add explicit stopping conditions:

~~202~~

203Once the workflow appears complete, add a lightweight verification step before returning the answer or taking an irreversible action. This helps catch requirement misses, grounding issues, and format drift before commit.

204 113

205```xml114```text

206<verification_loop>115Resolve the user query in the fewest useful tool loops, but do not let loop minimization outrank correctness, accessible fallback evidence, calculations, or required citation tags for factual claims.

207Before finalizing:

208- Check correctness: does the output satisfy every requirement?

209- Check grounding: are factual claims backed by the provided context or tool outputs?

210- Check formatting: does the output match the requested schema or style?

211- Check safety and irreversibility: if the next step has external side effects, ask permission first.

212</verification_loop>

213```

214 116

215```xml117After each result, ask: "Can I answer the user's core request now with useful evidence and citations for the factual claims?" If yes, answer.

216<missing_context_gating>

217- If required context is missing, do NOT guess.

218- Prefer the appropriate lookup tool when the missing context is retrievable; ask a minimal clarifying question only when it is not.

219- If you must proceed, label assumptions explicitly and choose a reversible action.

220</missing_context_gating>

221```118```

222 119

223For agents that actively take actions, add a short execution frame:120Define missing-evidence behavior:

224 121

225```xml122```text

226<action_safety>123Use the minimum evidence sufficient to answer correctly, cite it precisely, then stop.

227- Pre-flight: summarize the intended action and parameters in 1-2 lines.

228- Execute via tool.

229- Post-flight: confirm the outcome and any validation that was performed.

230</action_safety>

231```124```

232 125

233## Handle specialized workflows126## Formatting

~~234~~

235### Choose image detail explicitly for vision and computer use

~~236~~

237If your workflow depends on visual precision, specify the image `detail` level in the prompt or integration instead of relying on `auto`. Use `high` for standard high-fidelity image understanding. Use `original` for large, dense, or spatially sensitive images, especially [computer use, localization, OCR, and click-accuracy tasks](https://developers.openai.com/api/docs/guides/tools-computer-use) on `gpt-5.4` and future models. Use `low` only when speed and cost matter more than fine detail. For more details on image detail levels, see the [Images and Vision guide](https://developers.openai.com/api/docs/guides/images-vision).

238 127

239### Lock research and citations to retrieved evidence128GPT-5.5 is highly steerable on output format and structure. Use that control when it improves comprehension or product fit.

240 129

241When citation quality matters, make both the source boundary and the format requirement explicit. This helps reduce fabricated references, unsupported claims, and citation-format drift.130Set `text.verbosity`, describe the expected output shape, and reserve heavier structure for cases where it improves comprehension or your product UI needs a stable artifact. The API default for `text.verbosity` is `medium`; use `low` when you prefer shorter, more concise responses.

242 131

243```xml132Plain conversational formatting:

244<citation_rules>

245- Only cite sources retrieved in the current workflow.

246- Never fabricate citations, URLs, IDs, or quote spans.

247- Use exactly the citation format required by the host application.

248- Attach citations to the specific claims they support, not only at the end.

249</citation_rules>

250```

~~251~~

252```xml

253<grounding_rules>

254- Base claims only on provided context or tool outputs.

255- If sources conflict, state the conflict explicitly and attribute each side.

256- If the context is insufficient or irrelevant, narrow the answer or say you cannot support the claim.

257- If a statement is an inference rather than a directly supported fact, label it as an inference.

258</grounding_rules>

259```

260 133

261If your application requires inline citations, require inline citations. If it requires footnotes, require footnotes. The key is to lock the format and prevent the model from improvising unsupported references.134```text

~~262~~ 135Let formatting serve comprehension. Use plain paragraphs as the default format for normal conversation, explanations, reports, documentation, and technical writeups. Keep the presentation clean and readable without making the structure feel heavier than the content.

263### Research mode

264 136

265Push GPT-5.4 into a disciplined research mode. Use this pattern for research, review, and synthesis tasks. Do not force it onto short execution tasks or simple deterministic transforms.137Use headers, bold text, bullets, and numbered lists sparingly. Reach for them when the user requests them, when the answer needs clear comparison or ranking, or when the information would be harder to scan as prose. Otherwise, favor short paragraphs and natural transitions.

266 138

267```xml139Respect formatting preferences from the user. If they ask for a terse answer, minimal formatting, no bullets, no headers, or a specific structure, follow that preference unless there is a strong reason not to.

268<research_mode>

269- Do research in 3 passes:

270 1) Plan: list 3-6 sub-questions to answer.

271 2) Retrieve: search each sub-question and follow 1-2 second-order leads.

272 3) Synthesize: resolve contradictions and write the final answer with citations.

273- Stop only when more searching is unlikely to change the conclusion.

274</research_mode>

275```140```

276 141

277If your host environment uses a specific research tool or requires a submit step, combine this with the host's finalization contract.142Add explicit audience and length guidance:

~~278~~

279### Clamp strict output formats

~~280~~

281For SQL, JSON, or other parse-sensitive outputs, tell GPT-5.4 to emit only the target format and check it before finishing.

282 143

283```text144```text

284<structured_output_contract>145Write for a senior business audience. Keep the answer under 400 words. Use short paragraphs and only include bullets when they improve scannability. Prioritize the conclusion first, then the reasoning, then caveats.

285- Output only the requested format.

286- Do not add prose or markdown fences unless they were requested.

287- Validate that parentheses and brackets are balanced.

288- Do not invent tables or fields.

289- If required schema information is missing, ask for it or return an explicit error object.

290</structured_output_contract>

291```146```

292 147

293If you are extracting document regions or OCR boxes, define the coordinate system and add a drift check:148For editing, rewriting, summaries, or customer-facing messages, tell the model what to preserve before asking it to improve style. This pattern is useful when you want polish without expansion.

294 149

295```text150```text

296<bbox_extraction_spec>151Preserve the requested artifact, length, structure, and genre first. Quietly improve clarity, flow, and correctness. Do not add new claims, extra sections, or a more promotional tone unless explicitly requested.

297- Use the specified coordinate format exactly, such as [x1,y1,x2,y2] normalized to 0..1.

298- For each box, include page, label, text snippet, and confidence.

299- Add a vertical-drift sanity check so boxes stay aligned with the correct line of text.

300- If the layout is dense, process page by page and do a second pass for missed items.

301</bbox_extraction_spec>

302```

~~303~~

304### Keep tool boundaries explicit in coding and terminal agents

~~305~~

306In coding agents, GPT-5.4 works better when the rules for shell access and file editing are unambiguous. This is especially important when you expose tools like [Shell](https://developers.openai.com/api/docs/guides/tools-shell) or [Apply patch](https://developers.openai.com/api/docs/guides/tools-apply-patch).

~~307~~

308### User updates

~~309~~

310GPT-5.4 does well with brief, outcome-based updates. Reuse the user-updates pattern from the 5.2 guide, but pair it with explicit completion and verification requirements.

~~311~~

312Recommended update spec:

~~313~~

314```xml

315<user_updates_spec>

316- Only update the user when starting a new major phase or when something changes the plan.

317- Each update: 1 sentence on outcome + 1 sentence on next step.

318- Do not narrate routine tool calls.

319- Keep the user-facing status short; keep the work exhaustive.

320</user_updates_spec>

321```152```

322 153

323For coding agents, see the Prompting patterns for coding tasks section below for more specific guidance.154## Grounding, citations, and retrieval budgets

324 155

325### Prompting patterns for coding tasks156For grounded answers, citation behavior should be part of the prompt. Define what needs support, what counts as enough evidence, and how the model should behave when evidence is missing. Absence of evidence shouldn't automatically become a factual "no." For more details and examples, see the [citation formatting guide](https://developers.openai.com/api/docs/guides/citation-formatting).

326 157

327**Autonomy and persistence**158### Add an explicit retrieval budget

328 159

329GPT-5.4 is generally more thorough end to end than earlier mainline models on coding and tool-use tasks, so you often need less explicit "verify everything" prompting. Still, for high-stakes changes such as production, migrations, or security work, keep a lightweight verification clause.160Retrieval budgets are stopping rules for search. They tell the model when enough evidence is enough.

~~330~~

331```xml

332<autonomy_and_persistence>

333Persist until the task is fully handled end-to-end within the current turn whenever feasible: do not stop at analysis or partial fixes; carry changes through implementation, verification, and a clear explanation of outcomes unless the user explicitly pauses or redirects you.

~~334~~

335Unless the user explicitly asks for a plan, asks a question about the code, is brainstorming potential solutions, or some other intent that makes it clear that code should not be written, assume the user wants you to make code changes or run tools to solve the user's problem. In these cases, it's bad to output your proposed solution in a message, you should go ahead and actually implement the change. If you encounter challenges or blockers, you should attempt to resolve them yourself.

336</autonomy_and_persistence>

337```

338 161

339**Intermediary updates**162```text

~~340~~ 163For ordinary Q&A, start with one broad search using short, discriminative keywords. If the top results contain enough citable support for the core request, answer from those results instead of searching again.

341Keep updates sparse and high-signal. In coding tasks, prefer updates at key points.

~~342~~

343```xml

344<user_updates_spec>

345- Intermediary updates go to the `commentary` channel.

346- User updates are short updates while you are working. They are not final answers.

347- Use 1-2 sentence updates to communicate progress and new information while you work.

348- Do not begin responses with conversational interjections or meta commentary. Avoid openers such as acknowledgements ("Done -", "Got it", or "Great question") or similar framing.

349- Before exploring or doing substantial work, send a user update explaining your understanding of the request and your first step. Avoid commenting on the request or starting with phrases such as "Got it" or "Understood."

350- Provide updates roughly every 30 seconds while working.

351- When exploring, explain what context you are gathering and what you learned. Vary sentence structure so the updates do not become repetitive.

352- When working for a while, keep updates informative and varied, but stay concise.

353- When work is substantial, provide a longer plan after you have enough context. This is the only update that may be longer than 2 sentences and may contain formatting.

354- Before file edits, explain what you are about to change.

355- While thinking, keep the user informed of progress without narrating every tool call. Even if you are not taking actions, send frequent progress updates rather than going silent, especially if you are thinking for more than a short stretch.

356- Keep the tone of progress updates consistent with the assistant's overall personality.

357</user_updates_spec>

358```

~~359~~

360**Formatting**

361 164

362GPT-5.4 often defaults to more structured formatting and may overuse bullet lists. If you want a clean final response, explicitly clamp list shape.165Make another retrieval call only when:

166- The top results do not answer the core question.

167- A required fact, parameter, owner, date, ID, or source is missing.

168- The user asked for exhaustive coverage, a comparison, or a comprehensive list.

169- A specific document, URL, email, meeting, record, or code artifact must be read.

170- The answer would otherwise contain an important unsupported factual claim.

363 171

364```xml172Do not search again to improve phrasing, add examples, cite nonessential details, or support wording that can safely be made more generic.

365Never use nested bullets. Keep lists flat (single level). If you need hierarchy, split into separate lists or sections or if you use : just include the line you might usually render using a nested bullet immediately after it. For numbered lists, only use the `1. 2. 3.` style markers (with a period), never `1)`.

366```173```

367 174

368**Frontend tasks**175## Creative drafting guardrails

~~369~~

370Use this only when additional frontend guidance is useful.

~~371~~

372```xml

373<frontend_tasks>

374When doing frontend design tasks, avoid generic, overbuilt layouts.

~~375~~

376Use these hard rules:

377- One composition: The first viewport must read as one composition, not a dashboard, unless it is a dashboard.

378- Brand first: On branded pages, the brand or product name must be a hero-level signal, not just nav text or an eyebrow. No headline should overpower the brand.

379- Brand test: If the first viewport could belong to another brand after removing the nav, the branding is too weak.

380- Full-bleed hero only: On landing pages and promotional surfaces, the hero image should usually be a dominant edge-to-edge visual plane or background. Do not default to inset hero images, side-panel hero images, rounded media cards, tiled collages, or floating image blocks unless the existing design system clearly requires them.

381- Hero budget: The first viewport should usually contain only the brand, one headline, one short supporting sentence, one CTA group, and one dominant image. Do not place stats, schedules, event listings, address blocks, promos, "this week" callouts, metadata rows, or secondary marketing content there.

382- No hero overlays: Do not place detached labels, floating badges, promo stickers, info chips, or callout boxes on top of hero media.

383- Cards: Default to no cards. Never use cards in the hero unless they are the container for a user interaction. If removing a border, shadow, background, or radius does not hurt interaction or understanding, it should not be a card.

384- One job per section: Each section should have one purpose, one headline, and usually one short supporting sentence.

385- Real visual anchor: Imagery should show the product, place, atmosphere, or context.

386- Reduce clutter: Avoid pill clusters, stat strips, icon rows, boxed promos, schedule snippets, and competing text blocks.

387- Use motion to create presence and hierarchy, not noise. Ship 2-3 intentional motions for visually led work, and prefer Framer Motion when it is available.

~~388~~

389Exception: If working within an existing website or design system, preserve the established patterns, structure, and visual language.

390</frontend_tasks>

391```

392 176

393```xml177For drafting tasks, tell the model which claims must come from sources and which parts may be creatively written. This is especially important for slides, launch copy, customer summaries, talk tracks, leadership blurbs, and narrative framing.

394<terminal_tool_hygiene>

395- Only run shell commands via the terminal tool.

396- Never "run" tool names as shell commands.

397- If a patch or edit tool exists, use it directly; do not attempt it in bash.

398- After changes, run a lightweight verification step such as ls, tests, or a build before declaring the task done.

399</terminal_tool_hygiene>

400```

401 178

402### Document localization and OCR boxes179```text

~~403~~ 180For creative or generative requests such as slides, leadership blurbs, outbound copy, summaries for sharing, talk tracks, or narrative framing, distinguish source-backed facts from creative wording.

404For bbox tasks, be explicit about coordinate conventions and add drift tests.

405 181

406```xml182- Use retrieved or provided facts for concrete product, customer, metric, roadmap, date, capability, and competitive claims, and cite those claims.

407<bbox_extraction_spec>183- Do not invent specific names, first-party data claims, metrics, roadmap status, customer outcomes, or product capabilities to make the draft sound stronger.

408- Use the specified coordinate format exactly (for example [x1,y1,x2,y2] normalized 0..1).184- If there is little or no citable support, write a useful generic draft with placeholders or clearly labeled assumptions rather than unsupported specifics.

409- For each bbox, include: page, label, text snippet, confidence.

410- Add a vertical-drift sanity check:

411 - ensure bboxes align with the line of text (not shifted up or down).

412- If dense layout, process page by page and do a second pass for missed items.

413</bbox_extraction_spec>

414```185```

415 186

416### Use runtime and API integration notes187## Frontend engineering and visual taste

~~417~~

418For long-running or tool-heavy agents, the runtime contract matters as much as the prompt contract.

~~419~~

420#### Phase parameter

~~421~~

422For GPT-5.4, `gpt-5.3-codex`, and later Responses models, the `phase` field can

423help in the small number of long-running or tool-heavy flows where preambles or

424other intermediate assistant updates are mistaken for the final answer.

~~425~~

426- `phase` is optional at the API level, but it is highly recommended. Best-effort inference may exist server-side, but explicit round-tripping of `phase` is strictly better.

427- Use `phase` for long-running or tool-heavy agents that may emit commentary before tool calls or before a final answer.

428- Preserve `phase` when replaying prior assistant items so the model can distinguish working commentary from the completed answer. This matters most in multi-step flows with preambles, tool-related updates, or multiple assistant messages in the same turn.

429- Do not add `phase` to user messages.

430- If you use `previous_response_id`, that is usually the simplest path, since OpenAI can often recover prior state without manually replaying assistant items.

431- If you replay assistant history yourself, preserve the original `phase` values.

432- Missing or dropped `phase` can cause preambles to be interpreted as final answers and degrade behavior on those multi-step tasks.

~~433~~

434### Preserve behavior in long sessions

~~435~~

436Compaction unlocks significantly longer effective context windows, where user conversations can persist for many turns without hitting context limits or long-context performance degradation, and agents can perform very long trajectories that exceed a typical context window for long-running, complex tasks.

~~437~~

438If you are using [Compaction](https://developers.openai.com/api/docs/guides/compaction) in the Responses API, compact after major milestones, treat compacted items as opaque state, and keep prompts functionally identical after compaction. The endpoint is ZDR compatible and returns an `encrypted_content` item that you can pass into future requests. GPT-5.4 tends to remain more coherent and reliable over longer, multi-turn conversations with fewer breakdowns as sessions grow.

~~439~~

440For more guidance, see the [`/responses/compact` API reference](https://developers.openai.com/api/docs/api-reference/responses/compact).

441 188

442### Control personality for customer-facing workflows189For frontend work, refer to the [example instructions](https://developers.openai.com/api/docs/guides/frontend-prompt) for practical ways to steer UI quality. They cover product and user context, design-system alignment, first-screen usability, familiar controls, expected states, responsive behavior, and common generated-UI defaults to avoid, such as generic heroes, nested cards, decorative gradients, visible instructional text, and broken layouts.

443 190

444GPT-5.4 can be steered more effectively when you separate persistent personality from per-response writing controls. This is especially useful for customer-facing workflows such as emails, support replies, announcements, and blog-style content.191## Prompt the model to check its work

445 192

446- **Personality (persistent):** sets the default tone, verbosity, and decision style across the session.193Give GPT-5.5 access to tools that let it check outputs when validation is possible.

447- **Writing controls (per response):** define the channel, register, formatting, and length for a specific artifact.

448- **Reminder:** personality should not override task-specific output requirements. If the user asks for JSON, return JSON.

449 194

450For natural, high-quality prose, the highest-leverage controls are:195For coding agents, ask for concrete validation commands:

451 196

452- Give the model a clear persona.197```text

453- Specify the channel and emotional register.198After making changes, run the most relevant validation available:

454- Explicitly ban formatting when you want prose.199- targeted unit tests for changed behavior

455- Use hard length limits.200- type checks or lint checks when applicable

201- build checks for affected packages

202- a minimal smoke test when full validation is too expensive

456 203

457```xml204If validation cannot be run, explain why and describe the next best check.

458<personality_and_writing_controls>

459- Persona: <one sentence>

460- Channel: <Slack | email | memo | PRD | blog>

461- Emotional register: <direct/calm/energized/etc.> + "not <overdo this>"

462- Formatting: <ban bullets/headers/markdown if you want prose>

463- Length: <hard limit, e.g. <=150 words or 3-5 sentences>

464- Default follow-through: if the request is clear and low-risk, proceed without asking permission.

465</personality_and_writing_controls>

466```205```

467 206

468For more personality patterns you can lift directly, see the [Prompt Personalities cookbook](https://developers.openai.com/cookbook/examples/gpt-5/prompt_personalities).207For visual artifacts, ask for inspection after rendering:

~~469~~

470**Professional memo mode**

~~471~~

472For memos, reviews, and other professional writing tasks, general writing instructions are often not enough. These workflows benefit from explicit guidance on specificity, domain conventions, synthesis, and calibrated certainty.

473 208

474```xml209```text

475<memo_mode>210Render the artifact before finalizing. Inspect the rendered output for layout, clipping, spacing, missing content, and visual consistency. Revise until the rendered output matches the requirements.

476- Write in a polished, professional memo style.

477- Use exact names, dates, entities, and authorities when supported by the record.

478- Follow domain-specific structure if one is requested.

479- Prefer precise conclusions over generic hedging.

480- When uncertainty is real, tie it to the exact missing fact or conflicting source.

481- Synthesize across documents rather than summarizing each one independently.

482</memo_mode>

483```211```

484 212

485This mode is especially useful for legal, policy, research, and executive-facing writing, where the goal is not just fluency, but disciplined synthesis and clear conclusions.213For engineering and planning tasks, make implementation plans traceable:

~~486~~

487## Tune reasoning and migration

~~488~~

489### Treat reasoning effort as a last-mile knob

~~490~~

491Reasoning effort is not one-size-fits-all. Treat it as a last-mile tuning knob, not the primary way to improve quality. In many cases, stronger prompts, clear output contracts, and lightweight verification loops recover much of the performance teams might otherwise seek through higher reasoning settings.

~~492~~

493Recommended defaults:

~~494~~

495- `none`: Best for fast, cost-sensitive, latency-sensitive tasks where the model does not need to think.

496- `low`: Works well for latency-sensitive tasks where a small amount of thinking can produce a meaningful accuracy gain, especially with complex instructions.

497- `medium` or `high`: Reserve for tasks that truly require stronger reasoning and can absorb the latency and cost tradeoff. Choose between them based on how much performance gain your task gets from additional reasoning.

498- `xhigh`: Avoid as a default unless your evals show clear benefits. It is best suited for long, agentic, reasoning-heavy tasks where maximum intelligence matters more than speed or cost.

~~499~~

500In practice, most teams should default to the `none`, `low`, or `medium` range.

~~501~~

502Start with `none` for execution-heavy workloads such as workflow steps, field extraction, support triage, and short structured transforms.

503 214

504Start with `medium` or higher for research-heavy workloads such as long-context synthesis, multi-document review, conflict resolution, and strategy writing. With `medium` and a well-engineered prompt, you can squeeze out a lot of performance.215```text

~~505~~ 216For implementation plans, include:

506For GPT-5.4 workloads, `none` can already perform well on action-selection and tool-discipline tasks. If your workload depends on nuanced interpretation, such as implicit requirements, ambiguity, or cancelled-tool-call recovery, start with `low` or `medium` instead.217- requirements and where each is addressed

~~507~~ 218- named resources, files, APIs, or systems involved

508Before increasing reasoning effort, first add:219- state transitions or data flow where relevant

~~509~~ 220- validation commands or checks

510- `<completeness_contract>`221- failure behavior

511- `<verification_loop>`222- privacy and security considerations

512- `<tool_persistence_rules>`223- open questions that materially affect implementation

~~513~~

514If the model still feels too literal or stops at the first plausible answer, add an initiative nudge before raising reasoning effort:

~~515~~

516```xml

517<dig_deeper_nudge>

518- Don’t stop at the first plausible answer.

519- Look for second-order issues, edge cases, and missing constraints.

520- If the task is safety or accuracy critical, perform at least one verification step.

521</dig_deeper_nudge>

522```224```

523 225

524### Migrate prompts to GPT-5.4 one change at a time226## Phase parameter

~~525~~

526Use the same one-change-at-a-time discipline as the 5.2 guide: switch model first, pin `reasoning_effort`, run evals, then iterate.

~~527~~

528These starting points work well for many migrations:

~~529~~

530| Current setup | Suggested GPT-5.4 start | Notes |

531| ------------------------- | ---------------------------------- | ------------------------------------------------------------------- |

532| `gpt-5.2` | Match the current reasoning effort | Preserve the existing latency and quality profile first, then tune. |

533| `gpt-5.3-codex` | Match the current reasoning effort | For coding workflows, keep the reasoning effort the same. |

534| `gpt-4.1` or `gpt-4o` | `none` | Keep snappy behavior, and increase only if evals regress. |

535| Research-heavy assistants | `medium` or `high` | Use explicit research multi-pass and citation gating. |

536| Long-horizon agents | `medium` or `high` | Add tool persistence and completeness accounting. |

~~537~~

538### Small-model guidance for `gpt-5.4-mini` and `gpt-5.4-nano`

~~539~~

540`gpt-5.4-mini` and `gpt-5.4-nano` are highly steerable, but they are less likely than larger models to infer missing steps, resolve ambiguity implicitly, or package outputs the way you intended unless you specify that behavior directly. In practice, prompts for smaller models are often a bit longer and more explicit.

541 227

~~542**How `gpt-5.4-mini` differs**~~228Starting with GPT-5.4, long-running or tool-heavy Responses workflows can use assistant-item `phase` values to distinguish intermediate updates from final answers. GPT-5.5 uses the same pattern.

543 229

544- `gpt-5.4-mini` is more literal and makes fewer assumptions.230If you use `previous_response_id`, the API preserves prior assistant state automatically. If your application manually replays assistant output items into the next request, preserve each original `phase` value and pass it back unchanged. This matters most when a response includes preambles, repeated tool calls, or a final answer after intermediate assistant updates.

545- It is strong when the task is clearly structured, but weaker on implicit workflows and ambiguity handling.

546- By default, it may try to keep the conversation going with a follow-up question unless you suppress that behavior explicitly.

547 231

~~548**Prompting `gpt-5.4-mini`**~~232```text

~~549~~ 233If manually replaying assistant items:

550- Put critical rules first.234- Preserve assistant `phase` values exactly.

551- Specify the full execution order when tool use or side effects matter.235- Use `phase: "commentary"` for intermediate user-visible updates.

552- Do not rely on "you MUST" alone. Use structural scaffolding such as numbered steps, decision rules, and explicit action definitions.236- Use `phase: "final_answer"` for the completed answer.

553- Separate "do the action" from "report the action."237- Do not add `phase` to user messages.

554- Show the correct flow, not just the final format.238```

555- Define ambiguity behavior explicitly: when to ask, abstain, or proceed.

556- Specify packaging directly: answer length, whether to ask a follow-up question, citation style, and section order.

557- Be careful with `output nothing else`. Prefer scoped instructions such as `after the final JSON, output nothing further`.

~~558~~

~~559**Prompting `gpt-5.4-nano`**~~

~~560~~

561- Use `gpt-5.4-nano` only for narrow, well-bounded tasks.

562- Prefer closed outputs: labels, enums, short JSON, or fixed templates.

563- Avoid multi-step orchestration unless the flow is extremely constrained.

564- Route ambiguous or planning-heavy tasks to a stronger model instead of over-prompting `gpt-5.4-nano`.

~~565~~

566**Good default pattern**

~~567~~

5681. Task

5692. Critical rule

5703. Exact step order

5714. Edge cases or clarification behavior

5725. Output format

5736. One correct example

574 239

575**Avoid**240## Suggested prompt structure

576 241

577- Implied next steps242Use this structure as a starting point for complex prompts. Keep each section short. Add detail only where it changes behavior.

578- Unspecified edge cases

579- Schema-only prompts for tool workflows

580- Generic instructions without structure

581 243

582### Web search and deep research244```text

245Role: [1-2 sentences defining the model's function, context, and job]

583 246

584If you are migrating a research agent in particular, make these prompt updates before increasing reasoning effort:247# Personality

248[tone, demeanor, and collaboration style]

585 249

586- Add `<research_mode>`250# Goal

587- Add `<citation_rules>`251[user-visible outcome]

588- Add `<empty_result_recovery>`

589- Increase `reasoning_effort` one notch only after prompt fixes.

590 252

591You can start from the 5.2 research block and then layer in citation gating and finalization contracts as needed.253# Success criteria

254[what must be true before the final answer]

592 255

593GPT-5.4 performs especially well when the task requires multi-step evidence gathering, long-context synthesis, and explicit prompt contracts. In practice, the highest-leverage prompt changes are choosing reasoning effort by task shape, defining exact output and citation formats, adding dependency-aware tool rules, and making completion criteria explicit. The model is often strong out of the box, but it is most reliable when prompts clearly specify how to search, how to verify, and what counts as done.256# Constraints

257[policy, safety, business, evidence, and side-effect limits]

594 258

595## Next steps259# Output

260[sections, length, and tone]

596 261

597- Read [our latest model guide](https://developers.openai.com/api/docs/guides/latest-model) for model capabilities, parameters, and API compatibility details.

598- Read [Prompt engineering](https://developers.openai.com/api/docs/guides/prompt-engineering) for broader prompting strategies that apply across model families.

599- Read [Compaction](https://developers.openai.com/api/docs/guides/compaction) if you are building long-running GPT-5.4 sessions in the Responses API.

262# Stop rules

263[when to retry, fallback, abstain, ask, or stop]

264```

guides/reasoning.md +36 −20

Details

12 12

13 13

14 14

15**Reasoning models** like [GPT-5.4](https://developers.openai.com/api/docs/models/gpt-5.4) allocate internal reasoning tokens before producing a response. They work especially well for complex problem solving, coding, scientific reasoning, and multi-step agentic workflows. They're also the best models for [Codex CLI](https://github.com/openai/codex), our lightweight coding agent.15**Reasoning models** like [GPT-5.5](https://developers.openai.com/api/docs/models/gpt-5.5) use internal reasoning tokens before producing a response. This helps the model plan, use tools effectively, inspect alternatives, recover from ambiguity, and solve harder multi-step tasks. Reasoning models work especially well for complex problem solving, coding, scientific reasoning, and multi-step agentic workflows. They're also the best models for [Codex CLI](https://github.com/openai/codex), our lightweight coding agent.

16 16

17Start with `gpt-5.4` for most reasoning workloads. If you need the highest-intelligence API option for tougher problems that can tolerate more latency, use [`gpt-5.4-pro`](https://developers.openai.com/api/docs/models/gpt-5.4-pro). For lower cost and latency, consider `gpt-5-mini` or `gpt-5-nano`.17Start with `gpt-5.5` for most reasoning workloads. If you need the highest-intelligence API option for more challenging problems that can tolerate more latency, use [`gpt-5.5-pro`](https://developers.openai.com/api/docs/models/gpt-5.5-pro). For lower cost, consider `gpt-5.4` and for lower cost and latency, consider `gpt-5.4-mini`.

18 18

19**Reasoning models work better with the [Responses19**Reasoning models work better with the [Responses

20 API](https://developers.openai.com/api/docs/guides/migrate-to-responses)**. While the Chat Completions API20 API](https://developers.openai.com/api/docs/guides/migrate-to-responses)**. While the Chat Completions API

38\`;38\`;

39 39

40const response = await openai.responses.create({40const response = await openai.responses.create({

~~41 model: "gpt-5.4",~~41 model: "gpt-5.5",

42 reasoning: { effort: "low" },42 reasoning: { effort: "low" },

43 input: [43 input: [

44 {44 {

62"""62"""

63 63

64response = client.responses.create(64response = client.responses.create(

~~65 model="gpt-5.4",~~65 model="gpt-5.5",

66 reasoning={"effort": "low"},66 reasoning={"effort": "low"},

67 input=[67 input=[

68 {68 {

80 -H "Content-Type: application/json" \\80 -H "Content-Type: application/json" \\

81 -H "Authorization: Bearer $OPENAI_API_KEY" \\81 -H "Authorization: Bearer $OPENAI_API_KEY" \\

82 -d '{82 -d '{

~~83 "model": "gpt-5.4",~~83 "model": "gpt-5.5",

84 "reasoning": {"effort": "low"},84 "reasoning": {"effort": "low"},

85 "input": [85 "input": [

86 {86 {

92```92```

93 93

94 94

~~95In the example above, the `reasoning.effort` parameter guides the model on how many reasoning tokens to generate before creating a response to the prompt.~~95## Reasoning effort

96 96

97Supported values are model-dependent and can include `none`, `minimal`, `low`, `medium`, `high`, and `xhigh`. Lower effort favors speed and lower token usage, while higher effort favors more complete reasoning. Defaults are also model-dependent rather than universal. For example, `gpt-5.4` defaults to `none`, while older GPT-5 models default to `medium`.97The `reasoning.effort` parameter guides the model on how much to think when performing a task.

98 98

~~99| Effort | Start here when... |~~99Supported values are model-dependent and can include `none`, `minimal`, `low`, `medium`, `high`, and `xhigh`. Lower effort favors speed and lower token usage, while at higher effort the model thinks more completely to provide higher quality responses. The models also reason adaptively across reasoning efforts, using fewer tokens for simpler tasks and thinking harder for complex tasks.

100| ------------------ | ------------------------------------------------------------------------------------------------------- |100

101| `none` | You want the lowest latency for execution-heavy tasks such as extraction, routing, or simple transforms |101Defaults are also model-dependent rather than universal. `gpt-5.5` defaults to `medium` reasoning effort. This is the best starting point for `gpt-5.5`’s full balance of quality, reliability and performance.

102| `low` | A small amount of extra thinking can improve reliability without adding much latency |102

103| `medium` or `high` | The task involves planning, coding, synthesis, or harder reasoning |103| Effort | Best for... |

104| `xhigh` | Only when your evals show a clear benefit that justifies the extra latency and cost |104| -------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |

105| `none` | Latency-critical tasks that do not benefit from any reasoning or multi-chained tool calls. For latency-sensitive use cases with `gpt-5.5`, we recommend trying `low` to begin with and then moving to `none` if required. Common use cases include voice, fast information retrieval, and classification. |

106| `low` | Efficient reasoning with a modest latency increase. Ideal for use cases requiring tool-use, planning, search, or multi-step decision making, while optimizing for speed and cost. Common use cases include data analysis, drafting, execution-oriented coding, and customer support / chat assistant workflows. |

107| `medium` | When quality and reliability matter, and the task involves planning, complex reasoning, and judgement. Default configuration for most workloads, and a well-balanced point on the pareto curve of latency, performance and cost. Common use cases include agentic coding, research, working with spreadsheets & slides, and delegating long-horizon work. |

108| `high` | Hard reasoning, complex debugging, deep planning, and high-value tasks where quality and intelligence matters more than latency. Recommended for complex workflows and agentic tasks. Common use cases include agentic coding, long-horizon research, and knowledge work. Depending on the complexity of the task, evaluate both `medium` and `high`. |

109| `xhigh` | Deep research, asynchronous workflows and agentic tasks that require very long rollouts. Only use when your evals show a clear benefit that justifies the extra latency and cost. Common use cases include security and code review, enterprise productivity, deeper research tasks, and challenging coding workflows. |

110

111For faster time to first visible token in latency-sensitive applications, ask the model to generate a short preamble before continuing with deeper reasoning.

105 112

106Some models support only a subset of these values, so check the relevant [model page](https://developers.openai.com/api/docs/models) before choosing a setting.113Some models support only a subset of these values, so check the relevant [model page](https://developers.openai.com/api/docs/models) before choosing a setting.

107 114

108## How reasoning works115## How reasoning works

109 116

110Reasoning models introduce **reasoning tokens** in addition to input and output tokens. The models use these reasoning tokens to "think," breaking down the prompt and considering multiple approaches to generating a response. After generating reasoning tokens, the model produces an answer as visible completion tokens and discards the reasoning tokens from its context.117Reasoning models introduce **reasoning tokens** in addition to input and output tokens. The models use these reasoning tokens to "think," breaking down the prompt and considering multiple approaches to generating a response. Our reasoning models like gpt-5.5 and gpt-5.4 support interleaved thinking, where the model is able to generate visible output tokens before and in between thinking, and is able to think in between tool calls.

111 118

112Here is an example of a multi-step conversation between a user and an assistant. Input and output tokens from each step are carried over, while reasoning tokens are discarded.119Here is an example of a multi-step conversation between a user and an assistant. Input and output tokens from each step are carried over, while reasoning tokens are discarded.

113 120

165\`;172\`;

166 173

167const response = await openai.responses.create({174const response = await openai.responses.create({

168 model: "gpt-5.4",175 model: "gpt-5.5",

169 reasoning: { effort: "medium" },176 reasoning: { effort: "medium" },

170 input: [177 input: [

171 {178 {

200"""207"""

201 208

202response = client.responses.create(209response = client.responses.create(

203 model="gpt-5.4",210 model="gpt-5.5",

204 reasoning={"effort": "medium"},211 reasoning={"effort": "medium"},

205 input=[212 input=[

206 {213 {

239 -H "Content-Type: application/json" \246 -H "Content-Type: application/json" \

240 -H "Authorization: Bearer $OPENAI_API_KEY" \247 -H "Authorization: Bearer $OPENAI_API_KEY" \

241 -d '{248 -d '{

242 "model": "o4-mini",249 "model": "gpt-5.5",

243 "reasoning": {"effort": "medium"},250 "reasoning": {"effort": "medium"},

244 "input": "What is the weather like today?",251 "input": "What is the weather like today?",

245 "tools": [ ... function config here ... ],252 "tools": [ ... function config here ... ],

266const openai = new OpenAI();273const openai = new OpenAI();

267 274

268const response = await openai.responses.create({275const response = await openai.responses.create({

269 model: "gpt-5.4",276 model: "gpt-5.5",

270 input: "What is the capital of France?",277 input: "What is the capital of France?",

271 reasoning: {278 reasoning: {

272 effort: "low",279 effort: "low",

282client = OpenAI()289client = OpenAI()

283 290

284response = client.responses.create(291response = client.responses.create(

285 model="gpt-5.4",292 model="gpt-5.5",

286 input="What is the capital of France?",293 input="What is the capital of France?",

287 reasoning={294 reasoning={

288 "effort": "low",295 "effort": "low",

298 -H "Content-Type: application/json" \\305 -H "Content-Type: application/json" \\

299 -H "Authorization: Bearer $OPENAI_API_KEY" \\306 -H "Authorization: Bearer $OPENAI_API_KEY" \\

300 -d '{307 -d '{

301 "model": "gpt-5.4",308 "model": "gpt-5.5",

302 "input": "What is the capital of France?",309 "input": "What is the capital of France?",

303 "reasoning": {310 "reasoning": {

304 "effort": "low",311 "effort": "low",

345 to ensure safe deployment. Get started with verification on the [platform352 to ensure safe deployment. Get started with verification on the [platform

346 settings page](https://platform.openai.com/settings/organization/general).353 settings page](https://platform.openai.com/settings/organization/general).

347 354

355## `phase` parameter

356

357For long-running or tool-heavy flows with GPT-5.5 and GPT-5.4 in the Responses API, use the assistant message `phase` field to avoid early stopping and other misbehavior.

358`phase` is optional at the API level, but OpenAI recommends using it. Use `phase: "commentary"` for intermediate assistant updates, such as preambles before tool calls, and `phase: "final_answer"` for the completed answer. Don't add `phase` to user messages.

359Using `previous_response_id` is usually the simplest path because prior assistant state is preserved. If you replay assistant history manually, preserve each original `phase` value.

360Missing or dropped `phase` can cause preambles to be treated as final answers in those workflows. For model-specific prompt guidance, see [Prompting GPT-5.5](https://developers.openai.com/api/docs/guides/prompt-guidance?model=gpt-5.5#phase-parameter).

361

362### Round-trip assistant phase values

363

348## Advice on prompting364## Advice on prompting

349 365

350There are some differences to consider when prompting a reasoning model. Reasoning-capable GPT-5 models usually work best when you give them a clear goal, strong constraints, and an explicit output contract without prescribing every intermediate step.366There are some differences to consider when prompting a reasoning model. Reasoning-capable GPT-5 models usually work best when you give them a clear goal, strong constraints, and an explicit output contract without prescribing every intermediate step.

guides/safety-checks/cybersecurity.md +2 −2

Details

1# Cybersecurity checks1# Cybersecurity checks

2 2

3GPT-5.3-Codex is the first model we are classifying as having High Cybersecurity Capability under our [Preparedness Framework](https://cdn.openai.com/pdf/18a02b5d-6b67-4cec-ab64-68cdfbddebcd/preparedness-framework-v2.pdf). As a result, additional automated safeguards apply when this model, and any models released after it, are used via the API. Please note that the safeguards applied in the API differ from those used in Codex. You can learn more about the Codex safeguards [here](https://developers.openai.com/codex/concepts/cyber-safety/).3GPT-5.3-Codex and newer models, including GPT-5.4 and GPT-5.5, are classified as having High Cybersecurity Capability under our [Preparedness Framework](https://cdn.openai.com/pdf/18a02b5d-6b67-4cec-ab64-68cdfbddebcd/preparedness-framework-v2.pdf). As a result, additional automated safeguards apply when these models are used via the API. Please note that the safeguards applied in the API differ from those used in Codex. You can learn more about the Codex safeguards [here](https://developers.openai.com/codex/concepts/cyber-safety/).

4 4

5These safeguards monitor for signals of potentially suspicious cybersecurity activity. If certain thresholds are met, access to the model may be temporarily limited while activity is reviewed. Because these systems are still being calibrated, legitimate security research or defensive work may occasionally be flagged. We expect only a small portion of traffic to be impacted, and we’re continuing to refine the overall API experience.5These safeguards monitor for signals of potentially suspicious cybersecurity activity. If certain thresholds are met, access to the model may be temporarily limited while activity is reviewed. Because these systems are still being calibrated, legitimate security research or defensive work may occasionally be flagged. We expect only a small portion of traffic to be impacted, and we’re continuing to refine the overall API experience.

6 6

7## Safeguard actions for non-ZDR Organizations7## Safeguard actions for non-ZDR Organizations

8 8

9If our systems detect potentially suspicious cybersecurity activity within your traffic that exceeds defined thresholds, access to the model may be temporarily revoked. In this case, API requests will return an error with the error code `cyber_policy`.9If our systems detect potentially suspicious cybersecurity activity within your traffic that exceeds defined thresholds, access to these models may be temporarily revoked. In this case, API requests will return an error with the error code `cyber_policy`.

10 10

11If your organization has not implemented a per-user [safety_identifier](https://developers.openai.com/api/docs/guides/safety-best-practices#implement-safety-identifiers), access may be temporarily revoked for the **entire organization**. If your organization provides a unique [safety_identifier](https://developers.openai.com/api/docs/guides/safety-best-practices#implement-safety-identifiers) per end user, access may be temporarily revoked for the **specific affected user** rather than the entire organization (after human review and warnings). Providing safety identifiers helps minimize disruption to other users on your platform.11If your organization has not implemented a per-user [safety_identifier](https://developers.openai.com/api/docs/guides/safety-best-practices#implement-safety-identifiers), access may be temporarily revoked for the **entire organization**. If your organization provides a unique [safety_identifier](https://developers.openai.com/api/docs/guides/safety-best-practices#implement-safety-identifiers) per end user, access may be temporarily revoked for the **specific affected user** rather than the entire organization (after human review and warnings). Providing safety identifiers helps minimize disruption to other users on your platform.

12 12

guides/tools.md +7 −7

Details

32client = OpenAI()32client = OpenAI()

33 33

34response = client.responses.create(34response = client.responses.create(

~~35 model="gpt-4.1",~~35 model="gpt-5.5",

36 input="What is deep research by OpenAI?",36 input="What is deep research by OpenAI?",

37 tools=[{37 tools=[{

38 "type": "file_search",38 "type": "file_search",

47const openai = new OpenAI();47const openai = new OpenAI();

48 48

49const response = await openai.responses.create({49const response = await openai.responses.create({

~~50 model: "gpt-4.1",~~50 model: "gpt-5.5",

51 input: "What is deep research by OpenAI?",51 input: "What is deep research by OpenAI?",

52 tools: [52 tools: [

53 {53 {

63using OpenAI.Responses;63using OpenAI.Responses;

64 64

65string key = Environment.GetEnvironmentVariable("OPENAI_API_KEY")!;65string key = Environment.GetEnvironmentVariable("OPENAI_API_KEY")!;

~~66OpenAIResponseClient client = new(model: "gpt-5", apiKey: key);~~66OpenAIResponseClient client = new(model: "gpt-5.5", apiKey: key);

67 67

68ResponseCreationOptions options = new();68ResponseCreationOptions options = new();

69options.Tools.Add(ResponseTool.CreateFileSearchTool(["<vector_store_id>"]));69options.Tools.Add(ResponseTool.CreateFileSearchTool(["<vector_store_id>"]));

93-H "Content-Type: application/json" \\ 93-H "Content-Type: application/json" \\

94-H "Authorization: Bearer $OPENAI_API_KEY" \\ 94-H "Authorization: Bearer $OPENAI_API_KEY" \\

95-d '{95-d '{

~~96 "model": "gpt-5",~~96 "model": "gpt-5.5",

97 "tools": [97 "tools": [

98 {98 {

99 "type": "mcp",99 "type": "mcp",

112const client = new OpenAI();112const client = new OpenAI();

113 113

114const resp = await client.responses.create({114const resp = await client.responses.create({

115 model: "gpt-5",115 model: "gpt-5.5",

116 tools: [116 tools: [

117 {117 {

118 type: "mcp",118 type: "mcp",

134client = OpenAI()134client = OpenAI()

135 135

136resp = client.responses.create(136resp = client.responses.create(

137 model="gpt-5",137 model="gpt-5.5",

138 tools=[138 tools=[

139 {139 {

140 "type": "mcp",140 "type": "mcp",

154using OpenAI.Responses;154using OpenAI.Responses;

155 155

156string key = Environment.GetEnvironmentVariable("OPENAI_API_KEY")!;156string key = Environment.GetEnvironmentVariable("OPENAI_API_KEY")!;

157OpenAIResponseClient client = new(model: "gpt-5", apiKey: key);157OpenAIResponseClient client = new(model: "gpt-5.5", apiKey: key);

158 158

159ResponseCreationOptions options = new();159ResponseCreationOptions options = new();

160options.Tools.Add(ResponseTool.CreateMcpTool(160options.Tools.Add(ResponseTool.CreateMcpTool(

guides/tools-apply-patch.md +2 −0

Details

171 </div>171 </div>

172 </td>172 </td>

173 <td style={{ maxWidth: "150px" }}>173 <td style={{ maxWidth: "150px" }}>

174 [GPT-5.5](https://developers.openai.com/api/docs/models/gpt-5.5)

175  

174 [GPT-5.4](https://developers.openai.com/api/docs/models/gpt-5.4)176 [GPT-5.4](https://developers.openai.com/api/docs/models/gpt-5.4)

175  177  

176 [GPT-5.2](https://developers.openai.com/api/docs/models/gpt-5.2)178 [GPT-5.2](https://developers.openai.com/api/docs/models/gpt-5.2)

guides/tools-computer-use.md +1 −1

Details

373It's simple to migrate from the deprecated `computer-use-preview` tool to the new `computer` tool.373It's simple to migrate from the deprecated `computer-use-preview` tool to the new `computer` tool.

374| | Preview integration | GA integration |374| | Preview integration | GA integration |

375| --- | --- | --- |375| --- | --- | --- |

377| **Tool name** | `tools: [{ type: "computer_use_preview" }]` | `tools: [{ type: "computer" }]` |377| **Tool name** | `tools: [{ type: "computer_use_preview" }]` | `tools: [{ type: "computer" }]` |

guides/tools-connectors-mcp.md +4 −4

Details

38-H "Content-Type: application/json" \\ 38-H "Content-Type: application/json" \\

39-H "Authorization: Bearer $OPENAI_API_KEY" \\ 39-H "Authorization: Bearer $OPENAI_API_KEY" \\

40-d '{40-d '{

~~41 "model": "gpt-5",~~41 "model": "gpt-5.5",

42 "tools": [42 "tools": [

43 {43 {

44 "type": "mcp",44 "type": "mcp",

57const client = new OpenAI();57const client = new OpenAI();

58 58

59const resp = await client.responses.create({59const resp = await client.responses.create({

~~60 model: "gpt-5",~~60 model: "gpt-5.5",

61 tools: [61 tools: [

62 {62 {

63 type: "mcp",63 type: "mcp",

79client = OpenAI()79client = OpenAI()

80 80

81resp = client.responses.create(81resp = client.responses.create(

~~82 model="gpt-5",~~82 model="gpt-5.5",

83 tools=[83 tools=[

84 {84 {

85 "type": "mcp",85 "type": "mcp",

99using OpenAI.Responses;99using OpenAI.Responses;

100 100

101string key = Environment.GetEnvironmentVariable("OPENAI_API_KEY")!;101string key = Environment.GetEnvironmentVariable("OPENAI_API_KEY")!;

102OpenAIResponseClient client = new(model: "gpt-5", apiKey: key);102OpenAIResponseClient client = new(model: "gpt-5.5", apiKey: key);

103 103

104ResponseCreationOptions options = new();104ResponseCreationOptions options = new();

105options.Tools.Add(ResponseTool.CreateMcpTool(105options.Tools.Add(ResponseTool.CreateMcpTool(

guides/tools-image-generation.md +13 −13

Details

18const openai = new OpenAI();18const openai = new OpenAI();

19 19

20const response = await openai.responses.create({20const response = await openai.responses.create({

~~21 model: "gpt-5.4",~~21 model: "gpt-5.5",

22 input: "Generate an image of gray tabby cat hugging an otter with an orange scarf",22 input: "Generate an image of gray tabby cat hugging an otter with an orange scarf",

23 tools: [{type: "image_generation"}],23 tools: [{type: "image_generation"}],

24});24});

42client = OpenAI() 42client = OpenAI()

43 43

44response = client.responses.create(44response = client.responses.create(

~~45 model="gpt-5.4",~~45 model="gpt-5.5",

46 input="Generate an image of gray tabby cat hugging an otter with an orange scarf",46 input="Generate an image of gray tabby cat hugging an otter with an orange scarf",

47 tools=[{"type": "image_generation"}],47 tools=[{"type": "image_generation"}],

48)48)

86 86

87### Revised prompt87### Revised prompt

88 88

~~89When using the image generation tool, the mainline model, for example, `gpt-5.4`, will automatically revise your prompt for improved performance.~~89When using the image generation tool, the mainline model, for example, `gpt-5.5`, will automatically revise your prompt for improved performance.

90 90

91You can access the revised prompt in the `revised_prompt` field of the image generation call:91You can access the revised prompt in the `revised_prompt` field of the image generation call:

92 92

121const openai = new OpenAI();121const openai = new OpenAI();

122 122

123const response = await openai.responses.create({123const response = await openai.responses.create({

124 model: "gpt-5.4",124 model: "gpt-5.5",

125 input:125 input:

126 "Generate an image of gray tabby cat hugging an otter with an orange scarf",126 "Generate an image of gray tabby cat hugging an otter with an orange scarf",

127 tools: [{ type: "image_generation" }],127 tools: [{ type: "image_generation" }],

140// Follow up140// Follow up

141 141

142const response_fwup = await openai.responses.create({142const response_fwup = await openai.responses.create({

143 model: "gpt-5.4",143 model: "gpt-5.5",

144 previous_response_id: response.id,144 previous_response_id: response.id,

145 input: "Now make it look realistic",145 input: "Now make it look realistic",

146 tools: [{ type: "image_generation" }],146 tools: [{ type: "image_generation" }],

167client = OpenAI()167client = OpenAI()

168 168

169response = client.responses.create(169response = client.responses.create(

170 model="gpt-5.4",170 model="gpt-5.5",

171 input="Generate an image of gray tabby cat hugging an otter with an orange scarf",171 input="Generate an image of gray tabby cat hugging an otter with an orange scarf",

172 tools=[{"type": "image_generation"}],172 tools=[{"type": "image_generation"}],

173)173)

188# Follow up188# Follow up

189 189

190response_fwup = client.responses.create(190response_fwup = client.responses.create(

191 model="gpt-5.4",191 model="gpt-5.5",

192 previous_response_id=response.id,192 previous_response_id=response.id,

193 input="Now make it look realistic",193 input="Now make it look realistic",

194 tools=[{"type": "image_generation"}],194 tools=[{"type": "image_generation"}],

216const openai = new OpenAI();216const openai = new OpenAI();

217 217

218const response = await openai.responses.create({218const response = await openai.responses.create({

219 model: "gpt-5.4",219 model: "gpt-5.5",

220 input:220 input:

221 "Generate an image of gray tabby cat hugging an otter with an orange scarf",221 "Generate an image of gray tabby cat hugging an otter with an orange scarf",

222 tools: [{ type: "image_generation" }],222 tools: [{ type: "image_generation" }],

237// Follow up237// Follow up

238 238

239const response_fwup = await openai.responses.create({239const response_fwup = await openai.responses.create({

240 model: "gpt-5.4",240 model: "gpt-5.5",

241 input: [241 input: [

242 {242 {

243 role: "user",243 role: "user",

270import base64270import base64

271 271

272response = openai.responses.create(272response = openai.responses.create(

273 model="gpt-5.4",273 model="gpt-5.5",

274 input="Generate an image of gray tabby cat hugging an otter with an orange scarf",274 input="Generate an image of gray tabby cat hugging an otter with an orange scarf",

275 tools=[{"type": "image_generation"}],275 tools=[{"type": "image_generation"}],

276)276)

293# Follow up293# Follow up

294 294

295response_fwup = openai.responses.create(295response_fwup = openai.responses.create(

296 model="gpt-5.4",296 model="gpt-5.5",

297 input=[297 input=[

298 {298 {

299 "role": "user",299 "role": "user",

393- `gpt-5.4-mini`393- `gpt-5.4-mini`

394- `gpt-5.4-nano`394- `gpt-5.4-nano`

395- `gpt-5-nano`395- `gpt-5-nano`

396- `gpt-5.4`396- `gpt-5.5`

397- `gpt-5.2`397- `gpt-5.2`

398 398

399The model used for the image generation process is always a GPT Image model, including `gpt-image-2`, `gpt-image-1.5`, `gpt-image-1`, and `gpt-image-1-mini`, but these models aren't valid values for the `model` field in the Responses API. Use a text-capable mainline model (for example, `gpt-5.4` or `gpt-5`) with the hosted `image_generation` tool.

399The model used for the image generation process is always a GPT Image model, including `gpt-image-2`, `gpt-image-1.5`, `gpt-image-1`, and `gpt-image-1-mini`, but these models aren't valid values for the `model` field in the Responses API. Use a text-capable mainline model (for example, `gpt-5.5` or `gpt-5`) with the hosted `image_generation` tool.

guides/tools-shell.md +12 −12

Details

27const client = new OpenAI();27const client = new OpenAI();

28 28

29const response = await client.responses.create({29const response = await client.responses.create({

~~30 model: "gpt-5.4",~~30 model: "gpt-5.5",

31 tools: [{ type: "shell", environment: { type: "container_auto" } }],31 tools: [{ type: "shell", environment: { type: "container_auto" } }],

32 input: [32 input: [

33 {33 {

53client = OpenAI()53client = OpenAI()

54 54

55response = client.responses.create(55response = client.responses.create(

~~56 model="gpt-5.4",~~56 model="gpt-5.5",

57 tools=[{"type": "shell", "environment": {"type": "container_auto"}}],57 tools=[{"type": "shell", "environment": {"type": "container_auto"}}],

58 input=[58 input=[

59 {59 {

139const client = new OpenAI();139const client = new OpenAI();

140 140

141const response = await client.responses.create({141const response = await client.responses.create({

142 model: "gpt-5.4",142 model: "gpt-5.5",

143 tools: [143 tools: [

144 {144 {

145 type: "shell",145 type: "shell",

161client = OpenAI()161client = OpenAI()

162 162

163response = client.responses.create(163response = client.responses.create(

164 model="gpt-5.4",164 model="gpt-5.5",

165 tools=[165 tools=[

166 {166 {

167 "type": "shell",167 "type": "shell",

236const client = new OpenAI();236const client = new OpenAI();

237 237

238const response = await client.responses.create({238const response = await client.responses.create({

239 model: "gpt-5.4",239 model: "gpt-5.5",

240 tool_choice: "required",240 tool_choice: "required",

241 tools: [241 tools: [

242 {242 {

268client = OpenAI()268client = OpenAI()

269 269

270response = client.responses.create(270response = client.responses.create(

271 model="gpt-5.4",271 model="gpt-5.5",

272 tool_choice="required",272 tool_choice="required",

273 tools=[273 tools=[

274 {274 {

349});349});

350 350

351const response = await client.responses.create({351const response = await client.responses.create({

352 model: "gpt-5.4",352 model: "gpt-5.5",

353 tools: [353 tools: [

354 {354 {

355 type: "shell",355 type: "shell",

409)409)

410 410

411response = client.responses.create(411response = client.responses.create(

412 model="gpt-5.4",412 model="gpt-5.5",

413 tools=[413 tools=[

414 {414 {

415 "type": "shell",415 "type": "shell",

496const client = new OpenAI();496const client = new OpenAI();

497 497

498const response = await client.responses.create({498const response = await client.responses.create({

499 model: "gpt-5.4",499 model: "gpt-5.5",

500 input: [500 input: [

501 {501 {

502 role: "user",502 role: "user",

535client = OpenAI()535client = OpenAI()

536 536

537response = client.responses.create(537response = client.responses.create(

538 model="gpt-5.4",538 model="gpt-5.5",

539 input=[539 input=[

540 {540 {

541 "role": "user",541 "role": "user",

580const client = new OpenAI();580const client = new OpenAI();

581 581

582const response = await client.responses.create({582const response = await client.responses.create({

583 model: "gpt-5.4",583 model: "gpt-5.5",

584 previous_response_id: "resp_2a8e5c9174d63b0f18a4c572de9f64a1b3c76d508e12f9ab47",584 previous_response_id: "resp_2a8e5c9174d63b0f18a4c572de9f64a1b3c76d508e12f9ab47",

585 tools: [585 tools: [

586 {586 {

603client = OpenAI()603client = OpenAI()

604 604

605response = client.responses.create(605response = client.responses.create(

606 model="gpt-5.4",606 model="gpt-5.5",

607 previous_response_id="resp_2a8e5c9174d63b0f18a4c572de9f64a1b3c76d508e12f9ab47",607 previous_response_id="resp_2a8e5c9174d63b0f18a4c572de9f64a1b3c76d508e12f9ab47",

608 tools=[608 tools=[

609 {609 {

guides/tools-skills.md +4 −4

Details

36const client = new OpenAI();36const client = new OpenAI();

37 37

38const response = await client.responses.create({38const response = await client.responses.create({

~~39 model: "gpt-5.4",~~39 model: "gpt-5.5",

40 tools: [40 tools: [

41 {41 {

42 type: "shell",42 type: "shell",

61client = OpenAI()61client = OpenAI()

62 62

63response = client.responses.create(63response = client.responses.create(

~~64 model="gpt-5.4",~~64 model="gpt-5.5",

65 tools=[65 tools=[

66 {66 {

67 "type": "shell",67 "type": "shell",

102const client = new OpenAI();102const client = new OpenAI();

103 103

104const response = await client.responses.create({104const response = await client.responses.create({

105 model: "gpt-5.4",105 model: "gpt-5.5",

106 tools: [106 tools: [

107 {107 {

108 type: "shell",108 type: "shell",

130client = OpenAI()130client = OpenAI()

131 131

132response = client.responses.create(132response = client.responses.create(

133 model="gpt-5.4",133 model="gpt-5.5",

134 tools=[134 tools=[

135 {135 {

136 "type": "shell",136 "type": "shell",

guides/upgrading-to-gpt-5p5.md +174 −0 created

Details

1# Upgrading to GPT-5.5

3# Upgrading to GPT-5.5

5Use this guide when the user explicitly asks to upgrade an existing integration to GPT-5.5. Pair it with current OpenAI docs lookups. The default target string is `gpt-5.5`.

7## Upgrade posture

9Upgrade with the narrowest safe change set:

11- replace the model string first

12- update only the prompts that are directly tied to that model usage

13- do not automatically upgrade older or ambiguous model usages that may be intentionally pinned, such as historical docs, examples, tests, eval baselines, comparison code, or low-cost fallback/routing paths. Unless the user explicitly asks to upgrade all model usage, leave those sites unchanged and list them as confirmation-needed

14- prefer prompt-only upgrades when possible

15- if the upgrade would require API-surface changes, parameter rewrites, tool rewiring, provider migration, or broader code edits, mark it as blocked instead of stretching the scope

17## Upgrade workflow

191. Inventory current model usage.

20 - Search for model strings, client calls, and prompt-bearing files.

21 - Include inline prompts, prompt templates, YAML or JSON configs, Markdown docs, and saved prompts when they are clearly tied to a model usage site.

222. Pair each model usage with its prompt surface.

23 - Prefer the closest prompt surface first: inline system or developer text, then adjacent prompt files, then shared templates.

24 - If you cannot confidently tie a prompt to the model usage, say so instead of guessing.

253. Classify the source model family.

26 - Common buckets: GPT-5.4, GPT-5.3-Codex or GPT-5.2-Codex, earlier GPT-5.x, GPT-4o or GPT-4.1, reasoning models such as o1 or o3 or o4-mini, third-party model, or mixed and unclear.

274. Decide the upgrade class.

28 - `model string only`

29 - `model string + light prompt rewrite`

30 - `blocked without code changes`

315. Run the compatibility gate.

32 - Check whether the current integration can accept `gpt-5.5` without API-surface changes or implementation changes.

33 - Check whether structured outputs, tool schemas, function names, and downstream parsers can remain unchanged.

34 - For long-running Responses or tool-heavy agents, check whether `phase` is already preserved or round-tripped when the host replays assistant items or uses preambles.

35 - If compatibility depends on code changes, return `blocked`.

36 - If compatibility is unclear, return `unknown` rather than improvising.

376. Apply the upgrade when it is in scope.

38 - Default replacement string: `gpt-5.5`.

39 - Keep the intervention small and behavior-preserving.

40 - Start from the current reasoning effort when it is visible unless there is a measured reason to change it.

41 - For in-scope changes, update the model string and directly related prompts.

42 - For blocked or unknown changes, do not edit; report the blocker or uncertainty.

437. Summarize the result.

44 - `Current model usage`

45 - `Model-string updates`

46 - `Reasoning-effort handling`

47 - `Prompt updates`

48 - `Structured output and formatting assessment`

49 - `Tool-use assessment` when the flow uses tools, retrieval, or terminal actions

50 - `Phase assessment` when the flow is long-running, replayed, or tool-heavy

51 - `Compatibility check`

52 - `Validation performed`

54Output rule:

56- For each usage site, state the starting reasoning-effort recommendation.

57- If the repo exposes the current reasoning setting, recommend preserving it first unless current OpenAI docs say otherwise.

58- If the repo does not expose the current setting, recommend not adding one unless current OpenAI docs require it.

60## Upgrade outcomes

62### `model string only`

64Choose this when:

66- the source model is GPT-5.4

67- the existing prompts are already short, explicit, and task-bounded

68- the workflow does not rely on strict output formats, tool-call behavior, batch completeness, or long-horizon execution that should be validated after the upgrade

69- there are no obvious compatibility blockers

71Default action:

73- replace the model string with `gpt-5.5`

74- preserve the current reasoning effort

75- keep prompts unchanged

76- validate behavior with existing tests, realistic spot checks, or an existing eval suite when one is already available

78### `model string + light prompt rewrite`

80Choose this when:

82- the task needs stronger completeness, citation discipline, verification, or dependency handling

83- the upgraded model becomes too verbose, too dense, or hard to scan unless formatting is constrained

84- the workflow has strict output shape requirements and lacks an explicit format contract, schema, or parser validation

85- the workflow is research-heavy and needs stronger handling of sparse or empty retrieval results

86- the workflow is coding-oriented, terminal-based, tool-heavy, or multi-agent, but the existing API surface and tool definitions can remain unchanged

88Default action:

90- replace the model string with `gpt-5.5`

91- preserve the current reasoning effort for the first pass

92- make only the smallest prompt edits needed for the observed workflow risk

93- read the [GPT-5.5 prompting guide](https://developers.openai.com/api/docs/guides/prompt-guidance?model=gpt-5.5) to choose the smallest prompt changes that recover or improve behavior

94- avoid broad prompt cleanup unrelated to the upgrade

95- for research workflows, add citation rules, retrieval budgets, missing-evidence behavior, and validation guidance from the prompting guide

96- for dependency-aware or tool-heavy workflows, add prerequisite checks, missing-context handling, explicit tool budgets, stop conditions, and validation guidance

97- for coding or terminal workflows, add repo-specific constraints, acceptance criteria, and concrete validation commands

98- for multi-agent support or triage workflows, add task ownership, handoff, completeness, and stopping criteria

99- for long-running Responses agents with preambles or multiple assistant messages, explicitly review whether `phase` is already handled; if adding or preserving `phase` would require code edits, mark the path as `blocked`

100- do not classify a coding or tool-using Responses workflow as `blocked` just because the visible snippet is minimal; prefer `model string + light prompt rewrite` unless the repo clearly shows that a safe GPT-5.5 path would require host-side code changes

101

102### `blocked`

103

104Choose this when:

105

106- the upgrade appears to require API-surface changes

107- the upgrade appears to require parameter rewrites or reasoning-setting changes that are not exposed outside implementation code

108- the upgrade would require changing tool definitions, tool handler wiring, or schema contracts

109- the user is asking for a tooling, IDE, plugin, shell, or environment migration rather than a model and prompt migration

110- the integration depends on provider-specific APIs that do not map to the current OpenAI API surface without implementation work

111- you cannot confidently identify the prompt surface tied to the model usage

112

113Default action:

114

115- do not improvise a broader upgrade

116- report the blocker and explain that the fix is out of scope for this guide

117- if useful, describe the smallest follow-up implementation task that would unblock the migration

118

119## Compatibility checklist

120

121Before applying or recommending a model-and-prompt-only upgrade, check:

122

1231. Can the current host accept the `gpt-5.5` model string without changing client code or API surface?

1242. Are the related prompts identifiable and editable?

1253. Does the host depend on behavior that likely needs API-surface changes, parameter rewrites, provider migration, or tool rewiring?

1264. Would the likely fix be prompt-only, or would it need implementation changes?

1275. Is the prompt surface close enough to the model usage that you can make a targeted change instead of a broad cleanup?

1286. Do strict structured outputs, schemas, or downstream parsers still have an explicit contract?

1297. For long-running Responses or tool-heavy agents, is `phase` already preserved if the host relies on preambles, replayed assistant items, or multiple assistant messages?

1308. Are latency, token, or price assumptions validated by tests, realistic spot checks, or an existing eval suite rather than inferred from general model positioning?

131

132If item 1 is no, items 3 through 4 point to implementation work, or item 7 is no and the fix needs code changes, return `blocked`.

133

134If item 2 is no, return `unknown` unless the user can point to the prompt location.

135

136Important:

137

138- Existing use of tools, agents, or multiple usage sites is not by itself a blocker.

139- If the current host can keep the same API surface and the same tool definitions, prefer `model string + light prompt rewrite` over `blocked`.

140- Reserve `blocked` for cases that truly require implementation changes, not cases that only need stronger prompt steering.

141- Do not claim token savings without task-level validation.

142

143## Scope boundaries

144

145This guide may:

146

147- update or recommend updated model strings

148- update or recommend updated prompts

149- inspect code and prompt files to understand where those changes belong

150- inspect whether existing Responses flows already preserve `phase`

151- flag compatibility blockers

152- propose validation with existing tests, realistic spot checks, or existing eval suites

153

154This guide may not:

155

156- move Chat Completions code to Responses

157- move Responses code to another API surface

158- migrate SDKs, APIs, IDE configuration, shell hooks, plugins, or provider-specific tooling

159- rewrite parameter shapes

160- change tool definitions or tool-call handling

161- change structured-output wiring

162- add or retrofit `phase` handling in implementation code

163- edit business logic, orchestration logic, SDK usage, IDE configuration, shell hooks, or plugin integration behavior except for model-string replacements and directly related prompt edits

164

165If a safe GPT-5.5 upgrade requires any of those changes, mark the path as blocked and out of scope.

166

167## Validation plan

168

169- Validate each upgraded usage site with existing tests, realistic spot checks, or an existing eval suite when one is already available.

170- Compare against the current GPT-5.4 baseline when available.

171- Check task success, retry count, tool-call count, total tokens, latency, output shape, and user-visible quality.

172- For specialized workflows, validate the contract that matters most instead of judging only general output quality.

173- If prompt edits were added, confirm each block is doing real work instead of adding noise.

174- If the workflow has downstream impact, add a lightweight verification pass before finalization.

guides/voice-agents.md +1 −1

Details

54agent = Agent(54agent = Agent(

55 name="Assistant",55 name="Assistant",

56 instructions="You are a helpful voice assistant.",56 instructions="You are a helpful voice assistant.",

~~57 model="gpt-5.4",~~57 model="gpt-5.5",

58 tools=[get_weather],58 tools=[get_weather],

59)59)

60 60

guides/webhooks.md +3 −3

Details

96-H "Content-Type: application/json" \\96-H "Content-Type: application/json" \\

97-H "Authorization: Bearer $OPENAI_API_KEY" \\97-H "Authorization: Bearer $OPENAI_API_KEY" \\

98-d '{98-d '{

~~99 "model": "gpt-5.4",~~99 "model": "gpt-5.5",

100 "input": "Write a very long novel about otters in space.",100 "input": "Write a very long novel about otters in space.",

101 "background": true101 "background": true

102}'102}'

107const client = new OpenAI();107const client = new OpenAI();

108 108

109const resp = await client.responses.create({109const resp = await client.responses.create({

110 model: "gpt-5.4",110 model: "gpt-5.5",

111 input: "Write a very long novel about otters in space.",111 input: "Write a very long novel about otters in space.",

112 background: true,112 background: true,

113});113});

121client = OpenAI()121client = OpenAI()

122 122

123resp = client.responses.create(123resp = client.responses.create(

124 model="gpt-5.4",124 model="gpt-5.5",

125 input="Write a very long novel about otters in space.",125 input="Write a very long novel about otters in space.",

126 background=True,126 background=True,

127)127)

guides/websocket-mode.md +4 −4

Details

30 json.dumps(30 json.dumps(

31 {31 {

32 "type": "response.create",32 "type": "response.create",

~~33 "model": "gpt-5.4",~~33 "model": "gpt-5.5",

34 "store": False,34 "store": False,

35 "input": [35 "input": [

36 {36 {

59 json.dumps(59 json.dumps(

60 {60 {

61 "type": "response.create",61 "type": "response.create",

~~62 "model": "gpt-5.4",~~62 "model": "gpt-5.5",

63 "store": False,63 "store": False,

64 "previous_response_id": "resp_123",64 "previous_response_id": "resp_123",

65 "input": [65 "input": [

110```python110```python

111# Compact your current window (HTTP call)111# Compact your current window (HTTP call)

112compacted = client.responses.compact(112compacted = client.responses.compact(

113 model="gpt-5.4",113 model="gpt-5.5",

114 input=long_input_items_array,114 input=long_input_items_array,

115)115)

116 116

119 json.dumps(119 json.dumps(

120 {120 {

121 "type": "response.create",121 "type": "response.create",

122 "model": "gpt-5.4",122 "model": "gpt-5.5",

123 "store": False,123 "store": False,

124 "input": [124 "input": [

125 *compacted.output,125 *compacted.output,

guides/your-data.md +24 −20

Details

8 8

9When using the OpenAI API, data may be stored as:9When using the OpenAI API, data may be stored as:

10 10

11- **Abuse monitoring logs:** Logs generated from your use of the platform, necessary for OpenAI to enforce our [API data usage policies](https://openai.com/policies/api-data-usage-policies) and mitigate harmful uses of AI.11- **Abuse monitoring logs:** Logs generated from your use of the platform, necessary for OpenAI to enforce our [Usage Policies](https://openai.com/policies/usage-policies) and mitigate harmful uses of AI.

12- **Application state:** Data persisted from some API features in order to fulfill the task or request.12- **Application state:** Data persisted from some API features in order to fulfill the task or request.

13 13

14## Data retention controls for abuse monitoring14## Data retention controls for abuse monitoring

15 15

16Abuse monitoring logs may contain certain customer content, such as prompts and responses, as well as metadata derived from that customer content, such as classifier outputs. By default, abuse monitoring logs are generated for all API feature usage and retained for up to 30 days, unless we are legally required to retain the logs for longer.16Abuse monitoring logs may contain certain customer content, such as prompts and responses, as well as metadata derived from that customer content, such as classifier outputs. By default, abuse monitoring logs are generated for all API feature usage and retained for up to 30 days, unless longer retention is required by law, or is reasonably necessary to protect our services or any third party from harm.

17 17

18Eligible customers may have their customer content excluded from these abuse monitoring logs by getting approved for the [Zero Data Retention](#zero-data-retention) or [Modified Abuse Monitoring](#modified-abuse-monitoring) controls. Currently, these controls are subject to prior approval by OpenAI and acceptance of additional requirements. Approved customers may select between Modified Abuse Monitoring or Zero Data Retention for their API Organization or project.18Eligible customers may have their customer content excluded from these abuse monitoring logs, subject to the limitations below, by getting approved for the [Zero Data Retention](#zero-data-retention) or [Modified Abuse Monitoring](#modified-abuse-monitoring) controls. Currently, these controls are subject to prior approval by OpenAI and acceptance of additional requirements. Approved customers may select between Modified Abuse Monitoring or Zero Data Retention for their API Organization or project.

19 19

20Customers who enable Modified Abuse Monitoring or Zero Data Retention are responsible for ensuring their users abide by OpenAI's policies for safe and responsible use of AI and complying with any moderation and reporting requirements under applicable law.20Customers who enable Modified Abuse Monitoring or Zero Data Retention are responsible for ensuring their users abide by OpenAI's policies for safe and responsible use of AI and complying with any moderation and reporting requirements under applicable law.

21 21

23 23

24### Modified Abuse Monitoring24### Modified Abuse Monitoring

25 25

26Modified Abuse Monitoring excludes customer content (other than image and file inputs in rare cases, as described [below](#image-and-file-inputs)) from abuse monitoring logs across all API endpoints, while still allowing the customer to take advantage of the full capabilities of the OpenAI platform.26Modified Abuse Monitoring excludes customer content (other than image and file inputs in rare cases, as described [below](https://developers.openai.com/api/docs/guides/your-data#image-and-file-inputs)) from abuse monitoring logs across all API endpoints, while still allowing the customer to take advantage of the full capabilities of the OpenAI platform.

27 27

28### Zero Data Retention28### Zero Data Retention

29 29

~~30Zero Data Retention excludes customer content from abuse monitoring logs, in the same way as Modified Abuse Monitoring.~~30Zero Data Retention excludes customer content from abuse monitoring logs in the same way as Modified Abuse Monitoring.

31 31

32Additionally, Zero Data Retention changes some endpoint behavior: the `store` parameter for `/v1/responses` and `v1/chat/completions` will always be treated as `false`, even if the request attempts to set the value to `true`.32Additionally, Zero Data Retention changes some endpoint behavior: the `store` parameter for `/v1/responses` and `v1/chat/completions` will always be treated as `false`, even if the request attempts to set the value to `true`.

33 33

34Besides those specific behavior changes, the endpoints and capabilities listed as No for Zero Data Retention Eligible in the table below may still store application state, even if Zero Data Retention is enabled.34Besides those specific behavior changes, the endpoints and capabilities listed as No for Zero Data Retention Eligible in the table below may still store application state, even if Zero Data Retention is enabled.

35 35

36### Safety Retention

38We reserve the right to make `gpt-5.5`, `gpt-5.5-pro`, and future models ineligible for Zero Data Retention or Modified Abuse Monitoring for specific customers if reasonably necessary to investigate severe risk activity, as notified in advance to the impacted customers in writing. In this instance, we may retain customer content when using these models that our classifiers detect as potentially violating our [Usage Policies](https://openai.com/policies/usage-policies/). Otherwise retention will not be affected.

36### Configuring data retention controls40### Configuring data retention controls

37 41

38Once your organization has been approved for data retention controls, you'll see a **Data Retention** tab within [Settings → Organization → Data controls](https://platform.openai.com/settings/organization/data-controls/data-retention). From that tab, you can configure data retention controls at both the organization and project level.42Once your organization has been approved for data retention controls, you'll see a **Data Retention** tab within [Settings → Organization → Data controls](https://platform.openai.com/settings/organization/data-controls/data-retention). From that tab, you can configure data retention controls at both the organization and project level.

42 46

43### Storage requirements and retention controls per endpoint47### Storage requirements and retention controls per endpoint

44 48

45The table below indicates when application state is stored for each endpoint. Zero Data Retention eligible endpoints will not store any data. Zero Data Retention ineligible endpoints or capabilities may store application state when used, even if you have Zero Data Retention enabled.49The table below indicates when application state is stored for each endpoint. Zero Data Retention eligible endpoints do not retain any customer content for application state, subject to the limitations below. Zero Data Retention ineligible endpoints or capabilities may retain application state when used, even if you have Zero Data Retention enabled.

46 50

48| -------------------------- | :--------------------: | :------------------------: | :----------------------------: | :----------------------------: |52| -------------------------- | :--------------------: | :------------------------: | :----------------------------: | :----------------------------: |

78- Audio outputs application state is stored for 1 hour to enable [multi-turn conversations](https://developers.openai.com/api/docs/guides/audio).82- Audio outputs application state is stored for 1 hour to enable [multi-turn conversations](https://developers.openai.com/api/docs/guides/audio).

79- When Zero Data Retention is enabled for an organization, the `store` parameter will always be treated as `false`, even if the request attempts to set the value to `true`.83- When Zero Data Retention is enabled for an organization, the `store` parameter will always be treated as `false`, even if the request attempts to set the value to `true`.

80- See [image and file inputs](#image-and-file-inputs).84- See [image and file inputs](#image-and-file-inputs).

81- Extended prompt caching requires storing key/value tensors to GPU-local storage as application state. This data is stored on the local GPU machines and is not retained after the 24 hour data expiration. To learn more, see the [prompt caching guide](https://developers.openai.com/api/docs/guides/prompt-caching#prompt-cache-retention).85- Extended prompt caching requires storing encrypted key/value tensors to GPU-local storage as application state. This data is stored on the local GPU machines and is not retained after the 24 hour data expiration. Requests to gpt-5.5, gpt-5.5-pro, and all future models require extended prompt caching, and setting a prompt_cache_retention value to in_memory will cause a request error. To learn more, see the [prompt caching guide](https://developers.openai.com/api/docs/guides/prompt-caching#prompt-cache-retention).

82 86

83#### `/v1/responses`87#### `/v1/responses`

84 88

85- The Responses API has a 30 day Application State retention period by default, or when the `store` parameter is set to `true`. Response data will be stored for at least 30 days.89- The Responses API has a 30 day Application State retention period by default, or when the `store` parameter is set to `true`. Response data will be stored for at least 30 days.

86- When Zero Data Retention is enabled for an organization, the `store` parameter will always be treated as `false`, even if the request attempts to set the value to `true`.90- When Zero Data Retention is enabled for an organization, the `store` parameter will always be treated as `false`, even if the request attempts to set the value to `true`.

87- Background mode stores response data for roughly 10 minutes to enable polling, so it is not compatible with Zero Data Retention even though `background=true` is still accepted for legacy ZDR keys. Modified Abuse Monitoring (MAM) projects can continue to use background mode.91- Background mode stores response data to disk for roughly 10 minutes to enable polling.

88- Audio outputs application state is stored for 1 hour to enable [multi-turn conversations](https://developers.openai.com/api/docs/guides/audio).92- Audio outputs application state is stored for 1 hour to enable [multi-turn conversations](https://developers.openai.com/api/docs/guides/audio).

89- See [image and file inputs](#image-and-file-inputs).93- See [image and file inputs](#image-and-file-inputs).

90- MCP servers (used with the [remote MCP server tool](https://developers.openai.com/api/docs/guides/tools-remote-mcp)) are third-party services, and data sent to an MCP server is subject to their data retention policies.94- MCP servers (used with the [remote MCP server tool](https://developers.openai.com/api/docs/guides/tools-remote-mcp)) are third-party services, and data sent to an MCP server is subject to their data retention policies.

91- Hosted containers used by [Hosted Shell](https://developers.openai.com/api/docs/guides/tools-shell#hosted-shell-quickstart) and [Code Interpreter](https://developers.openai.com/api/docs/guides/tools-code-interpreter) may write temporary application state to the container filesystem (backed by ephemeral block storage) while the container is active. Container data is deleted when the container expires or is explicitly deleted.95- Hosted containers used by [Hosted Shell](https://developers.openai.com/api/docs/guides/tools-shell#hosted-shell-quickstart) and [Code Interpreter](https://developers.openai.com/api/docs/guides/tools-code-interpreter) may write temporary application state to the container filesystem (backed by ephemeral block storage) while the container is active. Container data is deleted when the container expires or is explicitly deleted.

92- Extended prompt caching requires storing key/value tensors to GPU-local storage as application state. This data is only stored on the local GPU machines and is not retained after the cache expires. To learn more, see the [prompt caching guide](https://developers.openai.com/api/docs/guides/prompt-caching#prompt-cache-retention).96- Extended prompt caching requires storing encrypted key/value tensors to GPU-local storage as application state. This data is stored on the local GPU machines and is not retained after the 24 hour data expiration. Requests to gpt-5.5, gpt-5.5-pro, and all future models require extended prompt caching, and setting a prompt_cache_retention value to in_memory will cause a request error. To learn more, see the [prompt caching guide](https://developers.openai.com/api/docs/guides/prompt-caching#prompt-cache-retention).

93- For server-side compaction, no data is retained when `store="false"`.97- For server-side compaction, no data is retained when `store="false"`.

94- We support [Skills](https://developers.openai.com/api/docs/guides/tools-skills) in two form factors, both local execution and hosted container-based execution. Hosted skills follow the same container lifecycle as hosted shell: mounted skills and container files remain available while the container is active and are discarded when the container expires or is deleted.98- We support [Skills](https://developers.openai.com/api/docs/guides/tools-skills) in two form factors, both local execution and hosted container-based execution. Hosted skills follow the same container lifecycle as hosted shell: mounted skills and container files remain available while the container is active and are discarded when the container expires or is deleted.

95- Data transmitted to third-party services over network connections is subject to their data retention policies.99- Data transmitted to third-party services over network connections is subject to their data retention policies.

108 112

109#### `/v1/videos`113#### `/v1/videos`

110 114

111- The `v1/videos` is not compatible with data retention controls. If your organization has data retention controls enabled, configure a project with its retention setting set to **None** as described in [Configuring data retention controls](#configuring-data-retention-controls) to use `/v1/videos` with that project.115- The `v1/videos` API includes a workflow that saves data to disk while processing and retains it for 48 hours to allow the caller to download the produced video and then for 30 days for abuse monitoring. `v1/videos` is currently blocked for MAM or ZDR requests. If your organization has data retention controls enabled, configure a project with its retention setting set to **None** as described in [Configuring data retention controls](#configuring-data-retention-controls) to use `/v1/videos` with that project.

112 116

113#### Image and file inputs117#### Image and file inputs

114 118

116 120

117#### Web Search121#### Web Search

118 122

119Web Search is ZDR eligible. Web Search with live internet access is not HIPAA eligible and is not covered by a BAA. Web Search in offline/cache-only mode (`external_web_access: false`) is HIPAA eligible and covered by a BAA when used with an API key from a ZDR-enabled project within a ZDR organization. This HIPAA/BAA guidance applies only to the Responses API `web_search` tool. Note: Preview variants (`web_search_preview`) ignore this parameter and behave as if `external_web_access` is `true`. We recommend using `web_search`.123Web Search with live internet access is not HIPAA eligible and is not covered by a BAA. Web Search in offline/cache-only mode (`external_web_access: false`) is eligible to be covered by a BAA when used with an API key from a ZDR-enabled project within a ZDR organization. This HIPAA/BAA guidance applies only to the Responses API `web_search` tool. Note: Preview variants (`web_search_preview`) ignore this parameter and behave as if `external_web_access` is `true`. We recommend using `web_search`.

120 124

121## Data residency controls125## Data residency controls

122 126

123Data residency controls are a project configuration option that allow you to configure the location of infrastructure OpenAI uses to provide services.127Data residency controls are a project configuration option that allow you to configure the location of infrastructure OpenAI uses to provide services.

124 128

125Contact our [sales team](https://openai.com/contact-sales) to see if you're eligible for using data residency controls. Data residency endpoints are charged a [10% uplift](https://developers.openai.com/api/docs/pricing) for `gpt-5.4` and `gpt-5.4-pro`.129Contact our [sales team](https://openai.com/contact-sales) to see if you're eligible for using data residency controls. Data residency endpoints are charged a [10% uplift](https://developers.openai.com/api/docs/pricing) for `gpt-5.5`, `gpt-5.5-pro`, `gpt-5.4` and `gpt-5.4-pro`.

126 130

127### How does data residency work?131### How does data residency work?

128 132

134 138

135### Limitations139### Limitations

136 140

137Data residency does not apply to: (a) any transmission or storage of Customer Content outside of the selected region caused by the location of an End User or Customer's infrastructure when accessing the services; (b) products, services, or content offered by parties other than OpenAI through the Services; or (c) any data other than Customer Content, such as system data.141Data residency does not apply to: (a) any transmission or storage of Customer Content outside of the selected region caused by the location of an End User or Customer’s infrastructure when accessing the services; (b) products, services, or content offered by parties other than OpenAI through the Services; or (c) any data other than Customer Content, such as system data.

138 142

139If your selected Region does not support regional processing, as identified below, OpenAI may also process and temporarily store Customer Content outside of the Region to deliver the services.143If your selected Region does not support regional processing, as identified below, OpenAI may also process and temporarily store Customer Content outside of the Region to deliver the services.

140 144

176**Table 2: API endpoint and tool support**180**Table 2: API endpoint and tool support**

177 181

179| ---------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------- |183| ---------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------- |

180| /v1/audio/transcriptions /v1/audio/translations /v1/audio/speech | tts-1 whisper-1 gpt-4o-tts gpt-4o-transcribe gpt-4o-mini-transcribe | All |184| /v1/audio/transcriptions /v1/audio/translations /v1/audio/speech | tts-1 whisper-1 gpt-4o-tts gpt-4o-transcribe gpt-4o-mini-transcribe | All |

181| /v1/batches | gpt-5.4-pro-2026-03-05 gpt-5.2-pro-2025-12-11 gpt-5-pro-2025-10-06 gpt-5-2025-08-07 gpt-5.4-2026-03-05 gpt-5.4-mini-2026-03-17 gpt-5.4-nano-2026-03-17 gpt-5.2-2025-12-11 gpt-5.1-2025-11-13 gpt-5-mini-2025-08-07 gpt-5-nano-2025-08-07 gpt-4.1-2025-04-14 gpt-4.1-mini-2025-04-14 gpt-4.1-nano-2025-04-14 o3-2025-04-16 o4-mini-2025-04-16 o1-pro o1-pro-2025-03-19 o3-mini-2025-01-31 o1-2024-12-17 o1-mini-2024-09-12 o1-preview gpt-4o-2024-11-20 gpt-4o-2024-08-06 gpt-4o-mini-2024-07-18 gpt-4-turbo-2024-04-09 gpt-4-0613 gpt-3.5-turbo-0125 | All |185| /v1/batches | gpt-5.5-pro-2026-04-23 gpt-5.4-pro-2026-03-05 gpt-5.2-pro-2025-12-11 gpt-5-pro-2025-10-06 gpt-5.5-2026-04-23 gpt-5.4-2026-03-05 gpt-5-2025-08-07 gpt-5.4-mini-2026-03-17 gpt-5.4-nano-2026-03-17 gpt-5.2-2025-12-11 gpt-5.1-2025-11-13 gpt-5-mini-2025-08-07 gpt-5-nano-2025-08-07 gpt-4.1-2025-04-14 gpt-4.1-mini-2025-04-14 gpt-4.1-nano-2025-04-14 o3-2025-04-16 o4-mini-2025-04-16 o1-pro o1-pro-2025-03-19 o3-mini-2025-01-31 o1-2024-12-17 o1-mini-2024-09-12 o1-preview gpt-4o-2024-11-20 gpt-4o-2024-08-06 gpt-4o-mini-2024-07-18 gpt-4-turbo-2024-04-09 gpt-4-0613 gpt-3.5-turbo-0125 | All |

182| /v1/chat/completions | gpt-5-2025-08-07 gpt-5.4-2026-03-05 gpt-5.4-mini-2026-03-17 gpt-5.4-nano-2026-03-17 gpt-5.2-2025-12-11 gpt-5.1-2025-11-13 gpt-5-mini-2025-08-07 gpt-5-nano-2025-08-07 gpt-5-chat-latest-2025-08-07 gpt-4.1-2025-04-14 gpt-4.1-mini-2025-04-14 gpt-4.1-nano-2025-04-14 o3-mini-2025-01-31 o3-2025-04-16 o4-mini-2025-04-16 o1-2024-12-17 o1-mini-2024-09-12 o1-preview gpt-4o-2024-11-20 gpt-4o-2024-08-06 gpt-4o-mini-2024-07-18 gpt-4-turbo-2024-04-09 gpt-4-0613 gpt-3.5-turbo-0125 | All |186| /v1/chat/completions | gpt-5.5-2026-04-23 gpt-5.4-2026-03-05 gpt-5.4-mini-2026-03-17 gpt-5.4-nano-2026-03-17 gpt-5.2-2025-12-11 gpt-5.1-2025-11-13 gpt-5-2025-08-07 gpt-5-mini-2025-08-07 gpt-5-nano-2025-08-07 gpt-5-chat-latest-2025-08-07 gpt-4.1-2025-04-14 gpt-4.1-mini-2025-04-14 gpt-4.1-nano-2025-04-14 o3-mini-2025-01-31 o3-2025-04-16 o4-mini-2025-04-16 o1-2024-12-17 o1-mini-2024-09-12 o1-preview gpt-4o-2024-11-20 gpt-4o-2024-08-06 gpt-4o-mini-2024-07-18 gpt-4-turbo-2024-04-09 gpt-4-0613 gpt-3.5-turbo-0125 | All |

183| /v1/embeddings | text-embedding-3-small text-embedding-3-large text-embedding-ada-002 | All |187| /v1/embeddings | text-embedding-3-small text-embedding-3-large text-embedding-ada-002 | All |

185| /v1/files | | All |189| /v1/files | | All |

189| /v1/moderations | text-moderation-latest\* omni-moderation-latest | All |193| /v1/moderations | text-moderation-latest\* omni-moderation-latest | All |

192| /v1/responses | gpt-5.4-pro-2026-03-05 gpt-5.2-pro-2025-12-11 gpt-5-pro-2025-10-06 gpt-5-2025-08-07 gpt-5.4-2026-03-05 gpt-5.4-mini-2026-03-17 gpt-5.4-nano-2026-03-17 gpt-5.2-2025-12-11 gpt-5.1-2025-11-13 gpt-5-mini-2025-08-07 gpt-5-nano-2025-08-07 gpt-5-chat-latest-2025-08-07 gpt-4.1-2025-04-14 gpt-4.1-mini-2025-04-14 gpt-4.1-nano-2025-04-14 o3-2025-04-16 o4-mini-2025-04-16 o1-pro o1-pro-2025-03-19 computer-use-preview\* o3-mini-2025-01-31 o1-2024-12-17 o1-mini-2024-09-12 o1-preview gpt-4o-2024-11-20 gpt-4o-2024-08-06 gpt-4o-mini-2024-07-18 gpt-4-turbo-2024-04-09 gpt-4-0613 gpt-3.5-turbo-0125 | All |196| /v1/responses | gpt-5.5-pro-2026-04-23 gpt-5.4-pro-2026-03-05 gpt-5.2-pro-2025-12-11 gpt-5-pro-2025-10-06 gpt-5.5-2026-04-23 gpt-5.4-2026-03-05 gpt-5-2025-08-07 gpt-5.4-mini-2026-03-17 gpt-5.4-nano-2026-03-17 gpt-5.2-2025-12-11 gpt-5.1-2025-11-13 gpt-5-mini-2025-08-07 gpt-5-nano-2025-08-07 gpt-5-chat-latest-2025-08-07 gpt-4.1-2025-04-14 gpt-4.1-mini-2025-04-14 gpt-4.1-nano-2025-04-14 o3-2025-04-16 o4-mini-2025-04-16 o1-pro o1-pro-2025-03-19 computer-use-preview\* o3-mini-2025-01-31 o1-2024-12-17 o1-mini-2024-09-12 o1-preview gpt-4o-2024-11-20 gpt-4o-2024-08-06 gpt-4o-mini-2024-07-18 gpt-4-turbo-2024-04-09 gpt-4-0613 gpt-3.5-turbo-0125 | All |

193| /v1/responses File Search | | All |197| /v1/responses File Search | | All |

194| /v1/responses Web Search | | All |198| /v1/responses Web Search | | All |

195| /v1/vector_stores | | All |199| /v1/vector_stores | | All |

206#### /v1/chat/completions210#### /v1/chat/completions

207 211

208- Cannot set store=true in non-US regions.212- Cannot set store=true in non-US regions.

209- [Extended prompt caching](https://developers.openai.com/api/docs/guides/prompt-caching#prompt-cache-retention) is only available in regions that support Regional processing.213- [Extended prompt caching](https://developers.openai.com/api/docs/guides/prompt-caching#prompt-cache-retention) in regions that do not support Regional processing may require that OpenAI process and temporarily store Customer Content outside of the Region to deliver the services.

210 214

211#### /v1/responses215#### /v1/responses

212 216

213- computer-use-preview snapshots are only supported for US/EU.217- computer-use-preview snapshots are only supported for US/EU.

214- Cannot set background=True in EU region.218- Cannot set background=True in EU region.

215- [Extended prompt caching](https://developers.openai.com/api/docs/guides/prompt-caching#prompt-cache-retention) is only available in regions that support Regional processing.219- [Extended prompt caching](https://developers.openai.com/api/docs/guides/prompt-caching#prompt-cache-retention) in regions that do not support Regional processing may require that OpenAI process and temporarily store Customer Content outside of the Region to deliver the services.

216 220

217#### /v1/realtime221#### /v1/realtime

218 222

230 234

231### EKM limitations235### EKM limitations

232 236

233OpenAI supports Bring Your Own Key (BYOK) encryption with external accounts in AWS KMS, Google Cloud (GCP), and Azure Key Vault. If your organization leverages a different key management services, those keys need to be synced to one of the supported Cloud KMSs for use with OpenAI.237OpenAI supports Bring Your Own Key (BYOK) encryption with external accounts in AWS KMS, Google Cloud (GCP), and Azure Key Vault. If your organization leverages a different key management service, those keys need to be synced to one of the supported Cloud KMSs for use with OpenAI.

234 238

235EKM does not support the following products. An attempt to use these endpoints in a project with EKM enabled will return an error.239EKM does not support the following products. An attempt to use these endpoints in a project with EKM enabled will return an error.

236 240

Documentation 2026-04-24 05:58 UTC to 2026-04-25 05:52 UTC