1# Prompt guidance for GPT-5.41# GPT-5.5 prompting guide
2 2
3GPT-5.4, our newest mainline model, is designed to balance long-running task performance, stronger control over style and behavior, and more disciplined execution across complex workflows. Building on advances from GPT-5 through GPT-5.3-Codex, GPT-5.4 improves token efficiency, sustains multi-step workflows more reliably, and performs well on long-horizon tasks.3Prompt GPT-5.5 with outcome-first goals, concise style controls, retrieval budgets, and validation loops.
4 4
5GPT-5.4 is designed for production-grade assistants and agents that need strong multi-step reasoning, evidence-rich synthesis, and reliable performance over long contexts. It is especially effective when prompts clearly specify the output contract, tool-use expectations, and completion criteria. In practice, the biggest gains come from choosing the right reasoning effort for the task, using explicit grounding and citation rules, and giving the model a precise definition of what "done" looks like. This guide focuses on prompt patterns and migration practices that preserve those efficiency wins. For model capabilities, API parameters, and broader migration guidance, see [our latest model guide](https://developers.openai.com/api/docs/guides/latest-model).5## New in GPT-5.5 vs GPT-5.4
6- Shorter, outcome-first prompts usually work better than process-heavy prompt stacks.
7- More efficient reasoning means `low` and `medium` effort should be re-evaluated before escalating.
8- Preambles, `phase` handling, and assistant-item replay remain important for tool-heavy Responses workflows.
9- Explicit personality, retrieval budgets, and validation rules help shape customer-facing and agentic UX.
6 10
7When troubleshooting cases where GPT-5.4 treats an intermediate update as the11GPT-5.5 works best when prompts define the outcome and leave room for the model to choose an efficient solution path. Compared with earlier models, you can often use shorter, more outcome-oriented prompts: describe what good looks like, what constraints matter, what evidence is available, and what the final answer should contain.
8 final answer, verify your integration preserves the assistant message `phase`
9 field correctly. See [Phase parameter](#phase-parameter) for details.
10 12
11## Understand GPT-5.4 behavior13Avoid carrying over every instruction from an older prompt stack. Legacy prompts often over-specify the process because earlier models needed more help staying on track. With GPT-5.5, that can add noise, narrow the model's search space, or lead to overly mechanical answers.
12 14
13### Where GPT-5.4 is strongest15For more detail on GPT-5.5 behavior changes, start with the [Using GPT-5.5 guide](https://developers.openai.com/api/docs/guides/latest-model). This guide focuses on prompt changes that follow from those behavior changes.
14 16
15GPT-5.4 tends to work especially well in these areas:17The patterns here are starting points. Adapt them to your product surface, tools, evals, and user experience goals.
16 18
17- Strong personality and tone adherence, with less drift over long answers19## Automated migration with Codex
18- Agentic workflow robustness, with a stronger tendency to stick with multi-step work, retry, and complete agent loops end to end
19- Evidence-rich synthesis, especially in long-context or multi-tool workflows
20- Instruction adherence in modular, skill-based, and block-structured prompts when the contract is explicit
21- Long-context analysis across large, messy, or multi-document inputs
22- Batched or parallel tool calling while maintaining tool-call accuracy
23- Spreadsheet, finance, and Excel workflows that need instruction following, formatting fidelity, and stronger self-verification
24 20
25### Where explicit prompting still helps21Codex can implement the changes from this guide with the [OpenAI Docs Skill](https://github.com/openai/skills/tree/main/skills/.curated/openai-docs).
26 22
27Even with those strengths, GPT-5.4 benefits from more explicit guidance in a few recurring patterns:23```text
24$openai-docs migrate this project to gpt-5.5
25```
28 26
29- Low-context tool routing early in a session, when tool selection can be less reliable27To use this skill in other coding agents, download it from the [OpenAI skills repository](https://github.com/openai/skills/tree/main/skills/.curated/openai-docs).
30- Dependency-aware workflows that need explicit prerequisite and downstream-step checks
31- Reasoning effort selection, where higher effort is not always better and the right choice depends on task shape, not intuition
32- Research tasks that require disciplined source collection and consistent citations
33- Irreversible or high-impact actions that require verification before execution
34- Terminal or coding-agent environments where tool boundaries must stay clear
35 28
36These patterns are observed defaults, not guarantees. Start with the smallest prompt that passes your evals, and add blocks only when they fix a measured failure mode.29## Personality and behavior
37 30
38## Use core prompt patterns31GPT-5.5's default style is efficient, direct, and task-oriented. This is useful for production systems: responses stay focused, behavior is easier to steer, and the model avoids unnecessary conversational padding.
39 32
40### Keep outputs compact and structured33For customer-facing assistants, support workflows, coaching experiences, and other conversational products, define both personality and collaboration style.
41 34
42To improve token efficiency with GPT-5.4, constrain verbosity and enforce structured output through clear output contracts. In practice, this acts as an additional control layer alongside the `verbosity` parameter in the Responses API, allowing you to guide both how much the model writes and how it structures the output.35- **Personality** controls how the assistant sounds: tone, warmth, directness, formality, humor, empathy, and level of polish.
36- **Collaboration style** controls how the assistant works: when it asks questions, when it makes assumptions, how proactive it should be, how much context it gives, when it checks work, and how it handles uncertainty or risk.
43 37
44```xml38Keep both short. Personality instructions should shape the user experience. Collaboration instructions should shape task behavior. Neither should replace clear goals, success criteria, tool rules, or stopping conditions.
45<output_contract>
46- Return exactly the sections requested, in the requested order.
47- If the prompt defines a preamble, analysis block, or working section, do not treat it as extra output.
48- Apply length limits only to the section they are intended for.
49- If a format is required (JSON, Markdown, SQL, XML), output only that format.
50</output_contract>
51 39
52<verbosity_controls>40Example personality block for a steady task-focused assistant:
53- Prefer concise, information-dense writing.
54- Avoid repeating the user's request.
55- Keep progress updates brief.
56- Do not shorten the answer so aggressively that required evidence, reasoning, or completion checks are omitted.
57</verbosity_controls>
58```
59 41
60### Set clear defaults for follow-through42```text
43# Personality
44You are a capable collaborator: approachable, steady, and direct. Assume the user is competent and acting in good faith, and respond with patience, respect, and practical helpfulness.
61 45
62Users often change the task, format, or tone mid-conversation. To keep the assistant aligned, define clear rules for when to proceed, when to ask, and how newer instructions override earlier defaults.46Prefer making progress over stopping for clarification when the request is already clear enough to attempt. Use context and reasonable assumptions to move forward. Ask for clarification only when the missing information would materially change the answer or create meaningful risk, and keep any question narrow.
63 47
64Use a default follow-through policy like this:48Stay concise without becoming curt. Give enough context for the user to understand and trust the answer, then stop. Use examples, comparisons, or simple analogies when they make the point easier to grasp. When correcting the user or disagreeing, be candid but constructive. When an error is pointed out, acknowledge it plainly and focus on fixing it.
65 49
66```xml50Match the user's tone within professional bounds. Avoid emojis and profanity by default, unless the user explicitly asks for that style or has clearly established it as appropriate for the conversation.
67<default_follow_through_policy>
68- If the user’s intent is clear and the next step is reversible and low-risk, proceed without asking.
69- Ask permission only if the next step is:
70 (a) irreversible,
71 (b) has external side effects (for example sending, purchasing, deleting, or writing to production), or
72 (c) requires missing sensitive information or a choice that would materially change the outcome.
73- If proceeding, briefly state what you did and what remains optional.
74</default_follow_through_policy>
75```51```
76 52
77Make instruction priority explicit:53Example personality block for an expressive collaborative assistant:
54
55```text
56# Personality
57Adopt a vivid conversational presence: intelligent, curious, playful when appropriate, and attentive to the user's thinking. Ask good questions when the problem is blurry, then become decisive once there is enough context.
78 58
79```xml59Be warm, collaborative, and polished. Conversation should feel easy and alive, but not chatty for its own sake. Offer a real point of view rather than merely mirroring the user, while staying responsive to their goals and constraints.
80<instruction_priority>60
81- User instructions override default style, tone, formatting, and initiative preferences.61Be thoughtful and grounded when the task calls for synthesis or advice. State a clear recommendation when you have enough context, explain important tradeoffs, and name uncertainty without becoming evasive.
82- Safety, honesty, privacy, and permission constraints do not yield.
83- If a newer user instruction conflicts with an earlier one, follow the newer instruction.
84- Preserve earlier instructions that do not conflict.
85</instruction_priority>
86```62```
87 63
88Higher-priority developer or system instructions remain binding.64For more expressive products, add warmth, curiosity, humor, or point of view explicitly, but keep the block short. Use personality to shape the experience, not to compensate for unclear goals or missing task instructions.
89 65
90**Guidance:** When instructions change mid-conversation, make the update explicit, scoped, and local. State what changed, what still applies, and whether the change affects the next turn or the rest of the conversation.66## Improve time to first visible token with a preamble
91 67
92### Handle mid-conversation instruction updates68In streaming applications, users notice how long it takes before the first visible response appears. GPT-5.5 may spend time reasoning, planning, or preparing tool calls before emitting visible text.
93 69
94For mid-conversation updates, use explicit, scoped steering messages that state:70For longer or tool-heavy tasks, prompt the model to start with a short preamble: a brief visible update that acknowledges the request and states the first step. This can improve perceived responsiveness without changing the underlying task.
95 71
961. Scope72Use this pattern when the task may take more than one step, require tool calls, or involve a long-running agent workflow.
972. Override
983. Carry forward
99 73
100```text74```text
101<task_update>75Before any tool calls for a multi-step task, send a short user-visible update that acknowledges the request and states the first step. Keep it to one or two sentences.
102For the next response only:
103- Do not complete the task.
104- Only produce a plan.
105- Keep it to 5 bullets.
106
107All earlier instructions still apply unless they conflict with this update.
108</task_update>
109```76```
110 77
111If the task itself changes, say so directly:78For coding agents that expose separate message phases, you can be more explicit:
112 79
113```text80```text
114<task_update>81You must always start with an intermediary update before any content in the analysis channel if the task will require calling tools. The user update should acknowledge the request and explain your first step.
115The task has changed.
116Previous task: complete the workflow.
117Current task: review the workflow and identify risks only.
118
119Rules for this turn:
120- Do not execute actions.
121- Do not call destructive tools.
122- Return exactly:
123 1. Main risks
124 2. Missing information
125 3. Recommended next step
126</task_update>
127```82```
128 83
129### Make tool use persistent when correctness depends on it84## Outcome-first prompts and stopping conditions
130
131Use explicit rules to keep tool use thorough, dependency-aware, and appropriately paced, especially in workflows where later actions rely on earlier retrieval or verification. A common failure mode is skipping prerequisites because the right end state seems obvious.
132 85
133GPT-5.4 can be less reliable at tool routing early in a session, when context is still thin. Prompt for prerequisites, dependency checks, and exact tool intent.86GPT-5.5 is strongest when the prompt defines the target outcome, success criteria, constraints, and available context, then lets the model choose the path.
134 87
135```xml88For many tasks, describe the destination rather than every step. This gives the model room to choose the right search, tool, or reasoning strategy for the task.
136<tool_persistence_rules>
137- Use tools whenever they materially improve correctness, completeness, or grounding.
138- Do not stop early when another tool call is likely to materially improve correctness or completeness.
139- Keep calling tools until:
140 (1) the task is complete, and
141 (2) verification passes (see <verification_loop>).
142- If a tool returns empty or partial results, retry with a different strategy.
143</tool_persistence_rules>
144```
145 89
146This is especially important for workflows where the final action depends on earlier lookup or retrieval steps. One of the most common failure modes is skipping prerequisites because the intended end state seems obvious.90Prefer this:
147
148```xml
149<dependency_checks>
150- Before taking an action, check whether prerequisite discovery, lookup, or memory retrieval steps are required.
151- Do not skip prerequisite steps just because the intended final action seems obvious.
152- If the task depends on the output of a prior step, resolve that dependency first.
153</dependency_checks>
154```
155 91
156Prompt for parallelism when the work is independent and wall-clock matters. Prompt for sequencing when dependencies, ambiguity, or irreversible actions matter more than speed.92```text
93Resolve the customer's issue end to end.
157 94
158```xml95Success means:
159<parallel_tool_calling>96- the eligibility decision is made from the available policy and account data
160- When multiple retrieval or lookup steps are independent, prefer parallel tool calls to reduce wall-clock time.97- any allowed action is completed before responding
161- Do not parallelize steps that have prerequisite dependencies or where one result determines the next action.98- the final answer includes completed_actions, customer_message, and blockers
162- After parallel retrieval, pause to synthesize the results before making more calls.99- if evidence is missing, ask for the smallest missing field
163- Prefer selective parallelism: parallelize independent evidence gathering, not speculative or redundant tool use.
164</parallel_tool_calling>
165```100```
166 101
167### Force completeness on long-horizon tasks102**Avoid unnecessary absolute rules.** Older prompts often use strict instructions like `ALWAYS`, `NEVER`, `must`, and `only` to control model behavior. Use those words for true invariants, such as safety rules, required output fields, or actions that should never happen. For judgment calls, such as when to search, ask for clarification, use a tool, or keep iterating, prefer decision rules instead.
168
169For multi-step workflows, a common failure mode is incomplete execution: the model finishes after partial coverage, misses items in a batch, or treats empty or narrow retrieval as final. GPT-5.4 becomes more reliable when the prompt defines explicit completion rules and recovery behavior.
170
171Coverage can be achieved through sequential or parallel retrieval, but completion rules should remain explicit either way.
172 103
173```xml104Avoid this style of instruction unless every step is truly required:
174<completeness_contract>
175- Treat the task as incomplete until all requested items are covered or explicitly marked [blocked].
176- Keep an internal checklist of required deliverables.
177- For lists, batches, or paginated results:
178 - determine expected scope when possible,
179 - track processed items or pages,
180 - confirm coverage before finalizing.
181- If any item is blocked by missing data, mark it [blocked] and state exactly what is missing.
182</completeness_contract>
183```
184 105
185For workflows where empty, partial, or noisy retrieval is common:106```text
186 107First inspect A, then inspect B, then compare every field, then think through
187```xml108all possible exceptions, then decide which tool to call, then call the tool,
188<empty_result_recovery>109then explain the entire process to the user.
189If a lookup returns empty, partial, or suspiciously narrow results:
190- do not immediately conclude that no results exist,
191- try at least one or two fallback strategies,
192 such as:
193 - alternate query wording,
194 - broader filters,
195 - a prerequisite lookup,
196 - or an alternate source or tool,
197- Only then report that no results were found, along with what you tried.
198</empty_result_recovery>
199```110```
200 111
201### Add a verification loop before high-impact actions112Add explicit stopping conditions:
202
203Once the workflow appears complete, add a lightweight verification step before returning the answer or taking an irreversible action. This helps catch requirement misses, grounding issues, and format drift before commit.
204 113
205```xml114```text
206<verification_loop>115Resolve the user query in the fewest useful tool loops, but do not let loop minimization outrank correctness, accessible fallback evidence, calculations, or required citation tags for factual claims.
207Before finalizing:
208- Check correctness: does the output satisfy every requirement?
209- Check grounding: are factual claims backed by the provided context or tool outputs?
210- Check formatting: does the output match the requested schema or style?
211- Check safety and irreversibility: if the next step has external side effects, ask permission first.
212</verification_loop>
213```
214 116
215```xml117After each result, ask: "Can I answer the user's core request now with useful evidence and citations for the factual claims?" If yes, answer.
216<missing_context_gating>
217- If required context is missing, do NOT guess.
218- Prefer the appropriate lookup tool when the missing context is retrievable; ask a minimal clarifying question only when it is not.
219- If you must proceed, label assumptions explicitly and choose a reversible action.
220</missing_context_gating>
221```118```
222 119
223For agents that actively take actions, add a short execution frame:120Define missing-evidence behavior:
224 121
225```xml122```text
226<action_safety>123Use the minimum evidence sufficient to answer correctly, cite it precisely, then stop.
227- Pre-flight: summarize the intended action and parameters in 1-2 lines.
228- Execute via tool.
229- Post-flight: confirm the outcome and any validation that was performed.
230</action_safety>
231```124```
232 125
233## Handle specialized workflows126## Formatting
234
235### Choose image detail explicitly for vision and computer use
236
237If your workflow depends on visual precision, specify the image `detail` level in the prompt or integration instead of relying on `auto`. Use `high` for standard high-fidelity image understanding. Use `original` for large, dense, or spatially sensitive images, especially [computer use, localization, OCR, and click-accuracy tasks](https://developers.openai.com/api/docs/guides/tools-computer-use) on `gpt-5.4` and future models. Use `low` only when speed and cost matter more than fine detail. For more details on image detail levels, see the [Images and Vision guide](https://developers.openai.com/api/docs/guides/images-vision).
238 127
239### Lock research and citations to retrieved evidence128GPT-5.5 is highly steerable on output format and structure. Use that control when it improves comprehension or product fit.
240 129
241When citation quality matters, make both the source boundary and the format requirement explicit. This helps reduce fabricated references, unsupported claims, and citation-format drift.130Set `text.verbosity`, describe the expected output shape, and reserve heavier structure for cases where it improves comprehension or your product UI needs a stable artifact. The API default for `text.verbosity` is `medium`; use `low` when you prefer shorter, more concise responses.
242 131
243```xml132Plain conversational formatting:
244<citation_rules>
245- Only cite sources retrieved in the current workflow.
246- Never fabricate citations, URLs, IDs, or quote spans.
247- Use exactly the citation format required by the host application.
248- Attach citations to the specific claims they support, not only at the end.
249</citation_rules>
250```
251
252```xml
253<grounding_rules>
254- Base claims only on provided context or tool outputs.
255- If sources conflict, state the conflict explicitly and attribute each side.
256- If the context is insufficient or irrelevant, narrow the answer or say you cannot support the claim.
257- If a statement is an inference rather than a directly supported fact, label it as an inference.
258</grounding_rules>
259```
260 133
261If your application requires inline citations, require inline citations. If it requires footnotes, require footnotes. The key is to lock the format and prevent the model from improvising unsupported references.134```text
262 135Let formatting serve comprehension. Use plain paragraphs as the default format for normal conversation, explanations, reports, documentation, and technical writeups. Keep the presentation clean and readable without making the structure feel heavier than the content.
263### Research mode
264 136
265Push GPT-5.4 into a disciplined research mode. Use this pattern for research, review, and synthesis tasks. Do not force it onto short execution tasks or simple deterministic transforms.137Use headers, bold text, bullets, and numbered lists sparingly. Reach for them when the user requests them, when the answer needs clear comparison or ranking, or when the information would be harder to scan as prose. Otherwise, favor short paragraphs and natural transitions.
266 138
267```xml139Respect formatting preferences from the user. If they ask for a terse answer, minimal formatting, no bullets, no headers, or a specific structure, follow that preference unless there is a strong reason not to.
268<research_mode>
269- Do research in 3 passes:
270 1) Plan: list 3-6 sub-questions to answer.
271 2) Retrieve: search each sub-question and follow 1-2 second-order leads.
272 3) Synthesize: resolve contradictions and write the final answer with citations.
273- Stop only when more searching is unlikely to change the conclusion.
274</research_mode>
275```140```
276 141
277If your host environment uses a specific research tool or requires a submit step, combine this with the host's finalization contract.142Add explicit audience and length guidance:
278
279### Clamp strict output formats
280
281For SQL, JSON, or other parse-sensitive outputs, tell GPT-5.4 to emit only the target format and check it before finishing.
282 143
283```text144```text
284<structured_output_contract>145Write for a senior business audience. Keep the answer under 400 words. Use short paragraphs and only include bullets when they improve scannability. Prioritize the conclusion first, then the reasoning, then caveats.
285- Output only the requested format.
286- Do not add prose or markdown fences unless they were requested.
287- Validate that parentheses and brackets are balanced.
288- Do not invent tables or fields.
289- If required schema information is missing, ask for it or return an explicit error object.
290</structured_output_contract>
291```146```
292 147
293If you are extracting document regions or OCR boxes, define the coordinate system and add a drift check:148For editing, rewriting, summaries, or customer-facing messages, tell the model what to preserve before asking it to improve style. This pattern is useful when you want polish without expansion.
294 149
295```text150```text
296<bbox_extraction_spec>151Preserve the requested artifact, length, structure, and genre first. Quietly improve clarity, flow, and correctness. Do not add new claims, extra sections, or a more promotional tone unless explicitly requested.
297- Use the specified coordinate format exactly, such as [x1,y1,x2,y2] normalized to 0..1.
298- For each box, include page, label, text snippet, and confidence.
299- Add a vertical-drift sanity check so boxes stay aligned with the correct line of text.
300- If the layout is dense, process page by page and do a second pass for missed items.
301</bbox_extraction_spec>
302```
303
304### Keep tool boundaries explicit in coding and terminal agents
305
306In coding agents, GPT-5.4 works better when the rules for shell access and file editing are unambiguous. This is especially important when you expose tools like [Shell](https://developers.openai.com/api/docs/guides/tools-shell) or [Apply patch](https://developers.openai.com/api/docs/guides/tools-apply-patch).
307
308### User updates
309
310GPT-5.4 does well with brief, outcome-based updates. Reuse the user-updates pattern from the 5.2 guide, but pair it with explicit completion and verification requirements.
311
312Recommended update spec:
313
314```xml
315<user_updates_spec>
316- Only update the user when starting a new major phase or when something changes the plan.
317- Each update: 1 sentence on outcome + 1 sentence on next step.
318- Do not narrate routine tool calls.
319- Keep the user-facing status short; keep the work exhaustive.
320</user_updates_spec>
321```152```
322 153
323For coding agents, see the Prompting patterns for coding tasks section below for more specific guidance.154## Grounding, citations, and retrieval budgets
324 155
325### Prompting patterns for coding tasks156For grounded answers, citation behavior should be part of the prompt. Define what needs support, what counts as enough evidence, and how the model should behave when evidence is missing. Absence of evidence shouldn't automatically become a factual "no." For more details and examples, see the [citation formatting guide](https://developers.openai.com/api/docs/guides/citation-formatting).
326 157
327**Autonomy and persistence**158### Add an explicit retrieval budget
328 159
329GPT-5.4 is generally more thorough end to end than earlier mainline models on coding and tool-use tasks, so you often need less explicit "verify everything" prompting. Still, for high-stakes changes such as production, migrations, or security work, keep a lightweight verification clause.160Retrieval budgets are stopping rules for search. They tell the model when enough evidence is enough.
330
331```xml
332<autonomy_and_persistence>
333Persist until the task is fully handled end-to-end within the current turn whenever feasible: do not stop at analysis or partial fixes; carry changes through implementation, verification, and a clear explanation of outcomes unless the user explicitly pauses or redirects you.
334
335Unless the user explicitly asks for a plan, asks a question about the code, is brainstorming potential solutions, or some other intent that makes it clear that code should not be written, assume the user wants you to make code changes or run tools to solve the user's problem. In these cases, it's bad to output your proposed solution in a message, you should go ahead and actually implement the change. If you encounter challenges or blockers, you should attempt to resolve them yourself.
336</autonomy_and_persistence>
337```
338 161
339**Intermediary updates**162```text
340 163For ordinary Q&A, start with one broad search using short, discriminative keywords. If the top results contain enough citable support for the core request, answer from those results instead of searching again.
341Keep updates sparse and high-signal. In coding tasks, prefer updates at key points.
342
343```xml
344<user_updates_spec>
345- Intermediary updates go to the `commentary` channel.
346- User updates are short updates while you are working. They are not final answers.
347- Use 1-2 sentence updates to communicate progress and new information while you work.
348- Do not begin responses with conversational interjections or meta commentary. Avoid openers such as acknowledgements ("Done -", "Got it", or "Great question") or similar framing.
349- Before exploring or doing substantial work, send a user update explaining your understanding of the request and your first step. Avoid commenting on the request or starting with phrases such as "Got it" or "Understood."
350- Provide updates roughly every 30 seconds while working.
351- When exploring, explain what context you are gathering and what you learned. Vary sentence structure so the updates do not become repetitive.
352- When working for a while, keep updates informative and varied, but stay concise.
353- When work is substantial, provide a longer plan after you have enough context. This is the only update that may be longer than 2 sentences and may contain formatting.
354- Before file edits, explain what you are about to change.
355- While thinking, keep the user informed of progress without narrating every tool call. Even if you are not taking actions, send frequent progress updates rather than going silent, especially if you are thinking for more than a short stretch.
356- Keep the tone of progress updates consistent with the assistant's overall personality.
357</user_updates_spec>
358```
359
360**Formatting**
361 164
362GPT-5.4 often defaults to more structured formatting and may overuse bullet lists. If you want a clean final response, explicitly clamp list shape.165Make another retrieval call only when:
166- The top results do not answer the core question.
167- A required fact, parameter, owner, date, ID, or source is missing.
168- The user asked for exhaustive coverage, a comparison, or a comprehensive list.
169- A specific document, URL, email, meeting, record, or code artifact must be read.
170- The answer would otherwise contain an important unsupported factual claim.
363 171
364```xml172Do not search again to improve phrasing, add examples, cite nonessential details, or support wording that can safely be made more generic.
365Never use nested bullets. Keep lists flat (single level). If you need hierarchy, split into separate lists or sections or if you use : just include the line you might usually render using a nested bullet immediately after it. For numbered lists, only use the `1. 2. 3.` style markers (with a period), never `1)`.
366```173```
367 174
368**Frontend tasks**175## Creative drafting guardrails
369
370Use this only when additional frontend guidance is useful.
371
372```xml
373<frontend_tasks>
374When doing frontend design tasks, avoid generic, overbuilt layouts.
375
376Use these hard rules:
377- One composition: The first viewport must read as one composition, not a dashboard, unless it is a dashboard.
378- Brand first: On branded pages, the brand or product name must be a hero-level signal, not just nav text or an eyebrow. No headline should overpower the brand.
379- Brand test: If the first viewport could belong to another brand after removing the nav, the branding is too weak.
380- Full-bleed hero only: On landing pages and promotional surfaces, the hero image should usually be a dominant edge-to-edge visual plane or background. Do not default to inset hero images, side-panel hero images, rounded media cards, tiled collages, or floating image blocks unless the existing design system clearly requires them.
381- Hero budget: The first viewport should usually contain only the brand, one headline, one short supporting sentence, one CTA group, and one dominant image. Do not place stats, schedules, event listings, address blocks, promos, "this week" callouts, metadata rows, or secondary marketing content there.
382- No hero overlays: Do not place detached labels, floating badges, promo stickers, info chips, or callout boxes on top of hero media.
383- Cards: Default to no cards. Never use cards in the hero unless they are the container for a user interaction. If removing a border, shadow, background, or radius does not hurt interaction or understanding, it should not be a card.
384- One job per section: Each section should have one purpose, one headline, and usually one short supporting sentence.
385- Real visual anchor: Imagery should show the product, place, atmosphere, or context.
386- Reduce clutter: Avoid pill clusters, stat strips, icon rows, boxed promos, schedule snippets, and competing text blocks.
387- Use motion to create presence and hierarchy, not noise. Ship 2-3 intentional motions for visually led work, and prefer Framer Motion when it is available.
388
389Exception: If working within an existing website or design system, preserve the established patterns, structure, and visual language.
390</frontend_tasks>
391```
392 176
393```xml177For drafting tasks, tell the model which claims must come from sources and which parts may be creatively written. This is especially important for slides, launch copy, customer summaries, talk tracks, leadership blurbs, and narrative framing.
394<terminal_tool_hygiene>
395- Only run shell commands via the terminal tool.
396- Never "run" tool names as shell commands.
397- If a patch or edit tool exists, use it directly; do not attempt it in bash.
398- After changes, run a lightweight verification step such as ls, tests, or a build before declaring the task done.
399</terminal_tool_hygiene>
400```
401 178
402### Document localization and OCR boxes179```text
403 180For creative or generative requests such as slides, leadership blurbs, outbound copy, summaries for sharing, talk tracks, or narrative framing, distinguish source-backed facts from creative wording.
404For bbox tasks, be explicit about coordinate conventions and add drift tests.
405 181
406```xml182- Use retrieved or provided facts for concrete product, customer, metric, roadmap, date, capability, and competitive claims, and cite those claims.
407<bbox_extraction_spec>183- Do not invent specific names, first-party data claims, metrics, roadmap status, customer outcomes, or product capabilities to make the draft sound stronger.
408- Use the specified coordinate format exactly (for example [x1,y1,x2,y2] normalized 0..1).184- If there is little or no citable support, write a useful generic draft with placeholders or clearly labeled assumptions rather than unsupported specifics.
409- For each bbox, include: page, label, text snippet, confidence.
410- Add a vertical-drift sanity check:
411 - ensure bboxes align with the line of text (not shifted up or down).
412- If dense layout, process page by page and do a second pass for missed items.
413</bbox_extraction_spec>
414```185```
415 186
416### Use runtime and API integration notes187## Frontend engineering and visual taste
417
418For long-running or tool-heavy agents, the runtime contract matters as much as the prompt contract.
419
420#### Phase parameter
421
422For GPT-5.4, `gpt-5.3-codex`, and later Responses models, the `phase` field can
423help in the small number of long-running or tool-heavy flows where preambles or
424other intermediate assistant updates are mistaken for the final answer.
425
426- `phase` is optional at the API level, but it is highly recommended. Best-effort inference may exist server-side, but explicit round-tripping of `phase` is strictly better.
427- Use `phase` for long-running or tool-heavy agents that may emit commentary before tool calls or before a final answer.
428- Preserve `phase` when replaying prior assistant items so the model can distinguish working commentary from the completed answer. This matters most in multi-step flows with preambles, tool-related updates, or multiple assistant messages in the same turn.
429- Do not add `phase` to user messages.
430- If you use `previous_response_id`, that is usually the simplest path, since OpenAI can often recover prior state without manually replaying assistant items.
431- If you replay assistant history yourself, preserve the original `phase` values.
432- Missing or dropped `phase` can cause preambles to be interpreted as final answers and degrade behavior on those multi-step tasks.
433
434### Preserve behavior in long sessions
435
436Compaction unlocks significantly longer effective context windows, where user conversations can persist for many turns without hitting context limits or long-context performance degradation, and agents can perform very long trajectories that exceed a typical context window for long-running, complex tasks.
437
438If you are using [Compaction](https://developers.openai.com/api/docs/guides/compaction) in the Responses API, compact after major milestones, treat compacted items as opaque state, and keep prompts functionally identical after compaction. The endpoint is ZDR compatible and returns an `encrypted_content` item that you can pass into future requests. GPT-5.4 tends to remain more coherent and reliable over longer, multi-turn conversations with fewer breakdowns as sessions grow.
439
440For more guidance, see the [`/responses/compact` API reference](https://developers.openai.com/api/docs/api-reference/responses/compact).
441 188
442### Control personality for customer-facing workflows189For frontend work, refer to the [example instructions](https://developers.openai.com/api/docs/guides/frontend-prompt) for practical ways to steer UI quality. They cover product and user context, design-system alignment, first-screen usability, familiar controls, expected states, responsive behavior, and common generated-UI defaults to avoid, such as generic heroes, nested cards, decorative gradients, visible instructional text, and broken layouts.
443 190
444GPT-5.4 can be steered more effectively when you separate persistent personality from per-response writing controls. This is especially useful for customer-facing workflows such as emails, support replies, announcements, and blog-style content.191## Prompt the model to check its work
445 192
446- **Personality (persistent):** sets the default tone, verbosity, and decision style across the session.193Give GPT-5.5 access to tools that let it check outputs when validation is possible.
447- **Writing controls (per response):** define the channel, register, formatting, and length for a specific artifact.
448- **Reminder:** personality should not override task-specific output requirements. If the user asks for JSON, return JSON.
449 194
450For natural, high-quality prose, the highest-leverage controls are:195For coding agents, ask for concrete validation commands:
451 196
452- Give the model a clear persona.197```text
453- Specify the channel and emotional register.198After making changes, run the most relevant validation available:
454- Explicitly ban formatting when you want prose.199- targeted unit tests for changed behavior
455- Use hard length limits.200- type checks or lint checks when applicable
201- build checks for affected packages
202- a minimal smoke test when full validation is too expensive
456 203
457```xml204If validation cannot be run, explain why and describe the next best check.
458<personality_and_writing_controls>
459- Persona: <one sentence>
460- Channel: <Slack | email | memo | PRD | blog>
461- Emotional register: <direct/calm/energized/etc.> + "not <overdo this>"
462- Formatting: <ban bullets/headers/markdown if you want prose>
463- Length: <hard limit, e.g. <=150 words or 3-5 sentences>
464- Default follow-through: if the request is clear and low-risk, proceed without asking permission.
465</personality_and_writing_controls>
466```205```
467 206
468For more personality patterns you can lift directly, see the [Prompt Personalities cookbook](https://developers.openai.com/cookbook/examples/gpt-5/prompt_personalities).207For visual artifacts, ask for inspection after rendering:
469
470**Professional memo mode**
471
472For memos, reviews, and other professional writing tasks, general writing instructions are often not enough. These workflows benefit from explicit guidance on specificity, domain conventions, synthesis, and calibrated certainty.
473 208
474```xml209```text
475<memo_mode>210Render the artifact before finalizing. Inspect the rendered output for layout, clipping, spacing, missing content, and visual consistency. Revise until the rendered output matches the requirements.
476- Write in a polished, professional memo style.
477- Use exact names, dates, entities, and authorities when supported by the record.
478- Follow domain-specific structure if one is requested.
479- Prefer precise conclusions over generic hedging.
480- When uncertainty is real, tie it to the exact missing fact or conflicting source.
481- Synthesize across documents rather than summarizing each one independently.
482</memo_mode>
483```211```
484 212
485This mode is especially useful for legal, policy, research, and executive-facing writing, where the goal is not just fluency, but disciplined synthesis and clear conclusions.213For engineering and planning tasks, make implementation plans traceable:
486
487## Tune reasoning and migration
488
489### Treat reasoning effort as a last-mile knob
490
491Reasoning effort is not one-size-fits-all. Treat it as a last-mile tuning knob, not the primary way to improve quality. In many cases, stronger prompts, clear output contracts, and lightweight verification loops recover much of the performance teams might otherwise seek through higher reasoning settings.
492
493Recommended defaults:
494
495- `none`: Best for fast, cost-sensitive, latency-sensitive tasks where the model does not need to think.
496- `low`: Works well for latency-sensitive tasks where a small amount of thinking can produce a meaningful accuracy gain, especially with complex instructions.
497- `medium` or `high`: Reserve for tasks that truly require stronger reasoning and can absorb the latency and cost tradeoff. Choose between them based on how much performance gain your task gets from additional reasoning.
498- `xhigh`: Avoid as a default unless your evals show clear benefits. It is best suited for long, agentic, reasoning-heavy tasks where maximum intelligence matters more than speed or cost.
499
500In practice, most teams should default to the `none`, `low`, or `medium` range.
501
502Start with `none` for execution-heavy workloads such as workflow steps, field extraction, support triage, and short structured transforms.
503 214
504Start with `medium` or higher for research-heavy workloads such as long-context synthesis, multi-document review, conflict resolution, and strategy writing. With `medium` and a well-engineered prompt, you can squeeze out a lot of performance.215```text
505 216For implementation plans, include:
506For GPT-5.4 workloads, `none` can already perform well on action-selection and tool-discipline tasks. If your workload depends on nuanced interpretation, such as implicit requirements, ambiguity, or cancelled-tool-call recovery, start with `low` or `medium` instead.217- requirements and where each is addressed
507 218- named resources, files, APIs, or systems involved
508Before increasing reasoning effort, first add:219- state transitions or data flow where relevant
509 220- validation commands or checks
510- `<completeness_contract>`221- failure behavior
511- `<verification_loop>`222- privacy and security considerations
512- `<tool_persistence_rules>`223- open questions that materially affect implementation
513
514If the model still feels too literal or stops at the first plausible answer, add an initiative nudge before raising reasoning effort:
515
516```xml
517<dig_deeper_nudge>
518- Don’t stop at the first plausible answer.
519- Look for second-order issues, edge cases, and missing constraints.
520- If the task is safety or accuracy critical, perform at least one verification step.
521</dig_deeper_nudge>
522```224```
523 225
524### Migrate prompts to GPT-5.4 one change at a time226## Phase parameter
525
526Use the same one-change-at-a-time discipline as the 5.2 guide: switch model first, pin `reasoning_effort`, run evals, then iterate.
527
528These starting points work well for many migrations:
529
530| Current setup | Suggested GPT-5.4 start | Notes |
531| ------------------------- | ---------------------------------- | ------------------------------------------------------------------- |
532| `gpt-5.2` | Match the current reasoning effort | Preserve the existing latency and quality profile first, then tune. |
533| `gpt-5.3-codex` | Match the current reasoning effort | For coding workflows, keep the reasoning effort the same. |
534| `gpt-4.1` or `gpt-4o` | `none` | Keep snappy behavior, and increase only if evals regress. |
535| Research-heavy assistants | `medium` or `high` | Use explicit research multi-pass and citation gating. |
536| Long-horizon agents | `medium` or `high` | Add tool persistence and completeness accounting. |
537
538### Small-model guidance for `gpt-5.4-mini` and `gpt-5.4-nano`
539
540`gpt-5.4-mini` and `gpt-5.4-nano` are highly steerable, but they are less likely than larger models to infer missing steps, resolve ambiguity implicitly, or package outputs the way you intended unless you specify that behavior directly. In practice, prompts for smaller models are often a bit longer and more explicit.
541 227
542**How `gpt-5.4-mini` differs**228Starting with GPT-5.4, long-running or tool-heavy Responses workflows can use assistant-item `phase` values to distinguish intermediate updates from final answers. GPT-5.5 uses the same pattern.
543 229
544- `gpt-5.4-mini` is more literal and makes fewer assumptions.230If you use `previous_response_id`, the API preserves prior assistant state automatically. If your application manually replays assistant output items into the next request, preserve each original `phase` value and pass it back unchanged. This matters most when a response includes preambles, repeated tool calls, or a final answer after intermediate assistant updates.
545- It is strong when the task is clearly structured, but weaker on implicit workflows and ambiguity handling.
546- By default, it may try to keep the conversation going with a follow-up question unless you suppress that behavior explicitly.
547 231
548**Prompting `gpt-5.4-mini`**232```text
549 233If manually replaying assistant items:
550- Put critical rules first.234- Preserve assistant `phase` values exactly.
551- Specify the full execution order when tool use or side effects matter.235- Use `phase: "commentary"` for intermediate user-visible updates.
552- Do not rely on "you MUST" alone. Use structural scaffolding such as numbered steps, decision rules, and explicit action definitions.236- Use `phase: "final_answer"` for the completed answer.
553- Separate "do the action" from "report the action."237- Do not add `phase` to user messages.
554- Show the correct flow, not just the final format.238```
555- Define ambiguity behavior explicitly: when to ask, abstain, or proceed.
556- Specify packaging directly: answer length, whether to ask a follow-up question, citation style, and section order.
557- Be careful with `output nothing else`. Prefer scoped instructions such as `after the final JSON, output nothing further`.
558
559**Prompting `gpt-5.4-nano`**
560
561- Use `gpt-5.4-nano` only for narrow, well-bounded tasks.
562- Prefer closed outputs: labels, enums, short JSON, or fixed templates.
563- Avoid multi-step orchestration unless the flow is extremely constrained.
564- Route ambiguous or planning-heavy tasks to a stronger model instead of over-prompting `gpt-5.4-nano`.
565
566**Good default pattern**
567
5681. Task
5692. Critical rule
5703. Exact step order
5714. Edge cases or clarification behavior
5725. Output format
5736. One correct example
574 239
575**Avoid**240## Suggested prompt structure
576 241
577- Implied next steps242Use this structure as a starting point for complex prompts. Keep each section short. Add detail only where it changes behavior.
578- Unspecified edge cases
579- Schema-only prompts for tool workflows
580- Generic instructions without structure
581 243
582### Web search and deep research244```text
245Role: [1-2 sentences defining the model's function, context, and job]
583 246
584If you are migrating a research agent in particular, make these prompt updates before increasing reasoning effort:247# Personality
248[tone, demeanor, and collaboration style]
585 249
586- Add `<research_mode>`250# Goal
587- Add `<citation_rules>`251[user-visible outcome]
588- Add `<empty_result_recovery>`
589- Increase `reasoning_effort` one notch only after prompt fixes.
590 252
591You can start from the 5.2 research block and then layer in citation gating and finalization contracts as needed.253# Success criteria
254[what must be true before the final answer]
592 255
593GPT-5.4 performs especially well when the task requires multi-step evidence gathering, long-context synthesis, and explicit prompt contracts. In practice, the highest-leverage prompt changes are choosing reasoning effort by task shape, defining exact output and citation formats, adding dependency-aware tool rules, and making completion criteria explicit. The model is often strong out of the box, but it is most reliable when prompts clearly specify how to search, how to verify, and what counts as done.256# Constraints
257[policy, safety, business, evidence, and side-effect limits]
594 258
595## Next steps259# Output
260[sections, length, and tone]
596 261
597- Read [our latest model guide](https://developers.openai.com/api/docs/guides/latest-model) for model capabilities, parameters, and API compatibility details.
598- Read [Prompt engineering](https://developers.openai.com/api/docs/guides/prompt-engineering) for broader prompting strategies that apply across model families.
599- Read [Compaction](https://developers.openai.com/api/docs/guides/compaction) if you are building long-running GPT-5.4 sessions in the Responses API.
262# Stop rules
263[when to retry, fallback, abstain, ask, or stop]
264```