Go Premium Account

Spybara
Companies
Openai
Api
Reference Changes, 2026-05-18 22:01 UTC to 2026-05-19 06:34 UTC
python/resources/realtime/subresources/calls/index.md

python/resources/realtime/subresources/calls/index.md 2026-05-18 22:01 UTC to 2026-05-19 06:34 UTC

0 added, 1685 removed.

2026

Wed 27 06:42 Fri 22 06:33 Wed 20 06:35 Tue 19 06:34 Mon 18 22:01 Mon 11 18:00 Thu 7 21:57 Tue 5 23:00 Sat 2 05:57

This document has no rendered page for this history range.

python/resources/realtime/subresources/calls/index.md +0 −1685 deleted

File Deleted View Diff

~~1# Calls~~

~~3## Accept call~~

~~5`realtime.calls.accept(strcall_id, CallAcceptParams**kwargs)`~~

~~7**post** `/realtime/calls/{call_id}/accept`~~

~~9Accept an incoming SIP call and configure the realtime session that will~~

~~10handle it.~~

~~12### Parameters~~

~~14- `call_id: str`~~

~~16- `type: Literal["realtime"]`~~

~~18 The type of session to create. Always `realtime` for the Realtime API.~~

~~20 - `"realtime"`~~

~~22- `audio: Optional[RealtimeAudioConfigParam]`~~

~~24 Configuration for input and output audio.~~

~~26 - `input: Optional[RealtimeAudioConfigInput]`~~

~~28 - `format: Optional[RealtimeAudioFormats]`~~

~~30 The format of the input audio.~~

~~32 - `class AudioPCM: …`~~

~~34 The PCM audio format. Only a 24kHz sample rate is supported.~~

~~36 - `rate: Optional[Literal[24000]]`~~

~~38 The sample rate of the audio. Always `24000`.~~

~~40 - `24000`~~

~~42 - `type: Optional[Literal["audio/pcm"]]`~~

~~44 The audio format. Always `audio/pcm`.~~

~~46 - `"audio/pcm"`~~

~~48 - `class AudioPCMU: …`~~

~~50 The G.711 μ-law format.~~

~~52 - `type: Optional[Literal["audio/pcmu"]]`~~

~~54 The audio format. Always `audio/pcmu`.~~

~~56 - `"audio/pcmu"`~~

~~58 - `class AudioPCMA: …`~~

~~60 The G.711 A-law format.~~

~~62 - `type: Optional[Literal["audio/pcma"]]`~~

~~64 The audio format. Always `audio/pcma`.~~

~~66 - `"audio/pcma"`~~

~~68 - `noise_reduction: Optional[NoiseReduction]`~~

~~70 Configuration for input audio noise reduction. This can be set to `null` to turn off.~~

~~71 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.~~

~~72 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.~~

~~74 - `type: Optional[NoiseReductionType]`~~

~~76 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.~~

~~78 - `"near_field"`~~

~~80 - `"far_field"`~~

~~82 - `transcription: Optional[AudioTranscription]`~~

84 Configuration for input audio transcription, defaults to off and can be set to `null` to turn off once on. Input audio transcription is not native to the model, since the model consumes audio directly. Transcription runs asynchronously through [the /audio/transcriptions endpoint](https://platform.openai.com/docs/api-reference/audio/createTranscription) and should be treated as guidance of input audio content rather than precisely what the model heard. The client can optionally set the language and prompt for transcription, these offer additional guidance to the transcription service.

~~86 - `delay: Optional[Literal["minimal", "low", "medium", 2 more]]`~~

~~88 Controls how long the model waits before emitting transcription text.~~

~~89 Higher values can improve transcription accuracy at the cost of latency.~~

~~90 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.~~

~~92 - `"minimal"`~~

~~94 - `"low"`~~

~~96 - `"medium"`~~

~~98 - `"high"`~~

100 - `"xhigh"`

~~101~~

102 - `language: Optional[str]`

~~103~~

104 The language of the input audio. Supplying the input language in

105 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format

106 will improve accuracy and latency.

~~107~~

108 - `model: Optional[Union[str, Literal["whisper-1", "gpt-4o-mini-transcribe", "gpt-4o-mini-transcribe-2025-12-15", 3 more], null]]`

~~109~~

110 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~111~~

112 - `str`

~~113~~

114 - `Literal["whisper-1", "gpt-4o-mini-transcribe", "gpt-4o-mini-transcribe-2025-12-15", 3 more]`

~~115~~

116 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~117~~

118 - `"whisper-1"`

~~119~~

120 - `"gpt-4o-mini-transcribe"`

~~121~~

122 - `"gpt-4o-mini-transcribe-2025-12-15"`

~~123~~

124 - `"gpt-4o-transcribe"`

~~125~~

126 - `"gpt-4o-transcribe-diarize"`

~~127~~

128 - `"gpt-realtime-whisper"`

~~129~~

130 - `prompt: Optional[str]`

~~131~~

132 An optional text to guide the model's style or continue a previous audio

133 segment.

134 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).

135 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".

136 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~137~~

138 - `turn_detection: Optional[RealtimeAudioInputTurnDetection]`

~~139~~

140 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.

~~141~~

142 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.

~~143~~

144 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.

~~145~~

146 For `gpt-realtime-whisper` transcription sessions, turn detection must be

147 set to `null`; VAD is not supported.

~~148~~

149 - `class ServerVad: …`

~~150~~

151 Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.

~~152~~

153 - `type: Literal["server_vad"]`

~~154~~

155 Type of turn detection, `server_vad` to turn on simple Server VAD.

~~156~~

157 - `"server_vad"`

~~158~~

159 - `create_response: Optional[bool]`

~~160~~

161 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.

~~162~~

163 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~164~~

165 - `idle_timeout_ms: Optional[int]`

~~166~~

167 Optional timeout after which a model response will be triggered automatically. This is

168 useful for situations in which a long pause from the user is unexpected, such as a phone

169 call. The model will effectively prompt the user to continue the conversation based

170 on the current context.

~~171~~

172 The timeout value will be applied after the last model response's audio has finished playing,

173 i.e. it's set to the `response.done` time plus audio playback duration.

~~174~~

175 An `input_audio_buffer.timeout_triggered` event (plus events

176 associated with the Response) will be emitted when the timeout is reached.

177 Idle timeout is currently only supported for `server_vad` mode.

~~178~~

179 - `interrupt_response: Optional[bool]`

~~180~~

181 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default

182 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.

~~183~~

184 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~185~~

186 - `prefix_padding_ms: Optional[int]`

~~187~~

188 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in

189 milliseconds). Defaults to 300ms.

~~190~~

191 - `silence_duration_ms: Optional[int]`

~~192~~

193 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults

194 to 500ms. With shorter values the model will respond more quickly,

195 but may jump in on short pauses from the user.

~~196~~

197 - `threshold: Optional[float]`

~~198~~

199 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

200 higher threshold will require louder audio to activate the model, and

201 thus might perform better in noisy environments.

~~202~~

203 - `class SemanticVad: …`

~~204~~

205 Server-side semantic turn detection which uses a model to determine when the user has finished speaking.

~~206~~

207 - `type: Literal["semantic_vad"]`

~~208~~

209 Type of turn detection, `semantic_vad` to turn on Semantic VAD.

~~210~~

211 - `"semantic_vad"`

~~212~~

213 - `create_response: Optional[bool]`

~~214~~

215 Whether or not to automatically generate a response when a VAD stop event occurs.

~~216~~

217 - `eagerness: Optional[Literal["low", "medium", "high", "auto"]]`

~~218~~

219 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.

~~220~~

221 - `"low"`

~~222~~

223 - `"medium"`

~~224~~

225 - `"high"`

~~226~~

227 - `"auto"`

~~228~~

229 - `interrupt_response: Optional[bool]`

~~230~~

231 Whether or not to automatically interrupt any ongoing response with output to the default

232 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.

~~233~~

234 - `output: Optional[RealtimeAudioConfigOutput]`

~~235~~

236 - `format: Optional[RealtimeAudioFormats]`

~~237~~

238 The format of the output audio.

~~239~~

240 - `speed: Optional[float]`

~~241~~

242 The speed of the model's spoken response as a multiple of the original speed.

243 1.0 is the default speed. 0.25 is the minimum speed. 1.5 is the maximum speed. This value can only be changed in between model turns, not while a response is in progress.

~~244~~

245 This parameter is a post-processing adjustment to the audio after it is generated, it's

246 also possible to prompt the model to speak faster or slower.

~~247~~

248 - `voice: Optional[Voice]`

~~249~~

250 The voice the model uses to respond. Supported built-in voices are

251 `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`, `shimmer`, `verse`,

252 `marin`, and `cedar`. You may also provide a custom voice object with

253 an `id`, for example `{ "id": "voice_1234" }`. Voice cannot be changed

254 during the session once the model has responded with audio at least once.

255 We recommend `marin` and `cedar` for best quality.

~~256~~

257 - `str`

~~258~~

259 - `Literal["alloy", "ash", "ballad", 7 more]`

~~260~~

261 - `"alloy"`

~~262~~

263 - `"ash"`

~~264~~

265 - `"ballad"`

~~266~~

267 - `"coral"`

~~268~~

269 - `"echo"`

~~270~~

271 - `"sage"`

~~272~~

273 - `"shimmer"`

~~274~~

275 - `"verse"`

~~276~~

277 - `"marin"`

~~278~~

279 - `"cedar"`

~~280~~

281 - `class VoiceID: …`

~~282~~

283 Custom voice reference.

~~284~~

285 - `id: str`

~~286~~

287 The custom voice ID, e.g. `voice_1234`.

~~288~~

289- `include: Optional[List[Literal["item.input_audio_transcription.logprobs"]]]`

~~290~~

291 Additional fields to include in server outputs.

~~292~~

293 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~294~~

295 - `"item.input_audio_transcription.logprobs"`

~~296~~

297- `instructions: Optional[str]`

~~298~~

299 The default system instructions (i.e. system message) prepended to model calls. This field allows the client to guide the model on desired responses. The model can be instructed on response content and format, (e.g. "be extremely succinct", "act friendly", "here are examples of good responses") and on audio behavior (e.g. "talk quickly", "inject emotion into your voice", "laugh frequently"). The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behavior.

~~300~~

301 Note that the server sets default instructions which will be used if this field is not set and are visible in the `session.created` event at the start of the session.

~~302~~

303- `max_output_tokens: Optional[Union[int, Literal["inf"]]]`

~~304~~

305 Maximum number of output tokens for a single assistant response,

306 inclusive of tool calls. Provide an integer between 1 and 4096 to

307 limit output tokens, or `inf` for the maximum available tokens for a

308 given model. Defaults to `inf`.

~~309~~

310 - `int`

~~311~~

312 - `Literal["inf"]`

~~313~~

314 - `"inf"`

~~315~~

316- `model: Optional[Union[str, Literal["gpt-realtime", "gpt-realtime-1.5", "gpt-realtime-2", 14 more]]]`

~~317~~

318 The Realtime model used for this session.

~~319~~

320 - `str`

~~321~~

322 - `Literal["gpt-realtime", "gpt-realtime-1.5", "gpt-realtime-2", 14 more]`

~~323~~

324 The Realtime model used for this session.

~~325~~

326 - `"gpt-realtime"`

~~327~~

328 - `"gpt-realtime-1.5"`

~~329~~

330 - `"gpt-realtime-2"`

~~331~~

332 - `"gpt-realtime-2025-08-28"`

~~333~~

334 - `"gpt-4o-realtime-preview"`

~~335~~

336 - `"gpt-4o-realtime-preview-2024-10-01"`

~~337~~

338 - `"gpt-4o-realtime-preview-2024-12-17"`

~~339~~

340 - `"gpt-4o-realtime-preview-2025-06-03"`

~~341~~

342 - `"gpt-4o-mini-realtime-preview"`

~~343~~

344 - `"gpt-4o-mini-realtime-preview-2024-12-17"`

~~345~~

346 - `"gpt-realtime-mini"`

~~347~~

348 - `"gpt-realtime-mini-2025-10-06"`

~~349~~

350 - `"gpt-realtime-mini-2025-12-15"`

~~351~~

352 - `"gpt-audio-1.5"`

~~353~~

354 - `"gpt-audio-mini"`

~~355~~

356 - `"gpt-audio-mini-2025-10-06"`

~~357~~

358 - `"gpt-audio-mini-2025-12-15"`

~~359~~

360- `output_modalities: Optional[List[Literal["text", "audio"]]]`

~~361~~

362 The set of modalities the model can respond with. It defaults to `["audio"]`, indicating

363 that the model will respond with audio plus a transcript. `["text"]` can be used to make

364 the model respond with text only. It is not possible to request both `text` and `audio` at the same time.

~~365~~

366 - `"text"`

~~367~~

368 - `"audio"`

~~369~~

370- `parallel_tool_calls: Optional[bool]`

~~371~~

372 Whether the model may call multiple tools in parallel. Only supported by

373 reasoning Realtime models such as `gpt-realtime-2`.

~~374~~

375- `prompt: Optional[ResponsePromptParam]`

~~376~~

377 Reference to a prompt template and its variables.

378 [Learn more](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts).

~~379~~

380 - `id: str`

~~381~~

382 The unique identifier of the prompt template to use.

~~383~~

384 - `variables: Optional[Dict[str, Variables]]`

~~385~~

386 Optional map of values to substitute in for variables in your

387 prompt. The substitution values can either be strings, or other

388 Response input types like images or files.

~~389~~

390 - `str`

~~391~~

392 - `class ResponseInputText: …`

~~393~~

394 A text input to the model.

~~395~~

396 - `text: str`

~~397~~

398 The text input to the model.

~~399~~

400 - `type: Literal["input_text"]`

~~401~~

402 The type of the input item. Always `input_text`.

~~403~~

404 - `"input_text"`

~~405~~

406 - `class ResponseInputImage: …`

~~407~~

408 An image input to the model. Learn about [image inputs](https://platform.openai.com/docs/guides/vision).

~~409~~

410 - `detail: Literal["low", "high", "auto", "original"]`

~~411~~

412 The detail level of the image to be sent to the model. One of `high`, `low`, `auto`, or `original`. Defaults to `auto`.

~~413~~

414 - `"low"`

~~415~~

416 - `"high"`

~~417~~

418 - `"auto"`

~~419~~

420 - `"original"`

~~421~~

422 - `type: Literal["input_image"]`

~~423~~

424 The type of the input item. Always `input_image`.

~~425~~

426 - `"input_image"`

~~427~~

428 - `file_id: Optional[str]`

~~429~~

430 The ID of the file to be sent to the model.

~~431~~

432 - `image_url: Optional[str]`

~~433~~

434 The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.

~~435~~

436 - `class ResponseInputFile: …`

~~437~~

438 A file input to the model.

~~439~~

440 - `type: Literal["input_file"]`

~~441~~

442 The type of the input item. Always `input_file`.

~~443~~

444 - `"input_file"`

~~445~~

446 - `detail: Optional[Literal["low", "high"]]`

~~447~~

448 The detail level of the file to be sent to the model. Use `low` for the default rendering behavior, or `high` to render the file at higher quality. Defaults to `low`.

~~449~~

450 - `"low"`

~~451~~

452 - `"high"`

~~453~~

454 - `file_data: Optional[str]`

~~455~~

456 The content of the file to be sent to the model.

~~457~~

458 - `file_id: Optional[str]`

~~459~~

460 The ID of the file to be sent to the model.

~~461~~

462 - `file_url: Optional[str]`

~~463~~

464 The URL of the file to be sent to the model.

~~465~~

466 - `filename: Optional[str]`

~~467~~

468 The name of the file to be sent to the model.

~~469~~

470 - `version: Optional[str]`

~~471~~

472 Optional version of the prompt template.

~~473~~

474- `reasoning: Optional[RealtimeReasoningParam]`

~~475~~

476 Configuration for reasoning-capable Realtime models such as `gpt-realtime-2`.

~~477~~

478 - `effort: Optional[RealtimeReasoningEffort]`

~~479~~

480 Constrains effort on reasoning for reasoning-capable Realtime models such as

481 `gpt-realtime-2`.

~~482~~

483 - `"minimal"`

~~484~~

485 - `"low"`

~~486~~

487 - `"medium"`

~~488~~

489 - `"high"`

~~490~~

491 - `"xhigh"`

~~492~~

493- `tool_choice: Optional[RealtimeToolChoiceConfigParam]`

~~494~~

495 How the model chooses tools. Provide one of the string modes or force a specific

496 function/MCP tool.

~~497~~

498 - `Literal["none", "auto", "required"]`

~~499~~

500 - `"none"`

~~501~~

502 - `"auto"`

~~503~~

504 - `"required"`

~~505~~

506 - `class ToolChoiceFunction: …`

~~507~~

508 Use this option to force the model to call a specific function.

~~509~~

510 - `name: str`

~~511~~

512 The name of the function to call.

~~513~~

514 - `type: Literal["function"]`

~~515~~

516 For function calling, the type is always `function`.

~~517~~

518 - `"function"`

~~519~~

520 - `class ToolChoiceMcp: …`

~~521~~

522 Use this option to force the model to call a specific tool on a remote MCP server.

~~523~~

524 - `server_label: str`

~~525~~

526 The label of the MCP server to use.

~~527~~

528 - `type: Literal["mcp"]`

~~529~~

530 For MCP tools, the type is always `mcp`.

~~531~~

532 - `"mcp"`

~~533~~

534 - `name: Optional[str]`

~~535~~

536 The name of the tool to call on the server.

~~537~~

538- `tools: Optional[RealtimeToolsConfigParam]`

~~539~~

540 Tools available to the model.

~~541~~

542 - `class RealtimeFunctionTool: …`

~~543~~

544 - `description: Optional[str]`

~~545~~

546 The description of the function, including guidance on when and how

547 to call it, and guidance about what to tell the user when calling

548 (if anything).

~~549~~

550 - `name: Optional[str]`

~~551~~

552 The name of the function.

~~553~~

554 - `parameters: Optional[object]`

~~555~~

556 Parameters of the function in JSON Schema.

~~557~~

558 - `type: Optional[Literal["function"]]`

~~559~~

560 The type of the tool, i.e. `function`.

~~561~~

562 - `"function"`

~~563~~

564 - `class Mcp: …`

~~565~~

566 Give the model access to additional tools via remote Model Context Protocol

567 (MCP) servers. [Learn more about MCP](https://platform.openai.com/docs/guides/tools-remote-mcp).

~~568~~

569 - `server_label: str`

~~570~~

571 A label for this MCP server, used to identify it in tool calls.

~~572~~

573 - `type: Literal["mcp"]`

~~574~~

575 The type of the MCP tool. Always `mcp`.

~~576~~

577 - `"mcp"`

~~578~~

579 - `allowed_tools: Optional[McpAllowedTools]`

~~580~~

581 List of allowed tool names or a filter object.

~~582~~

583 - `List[str]`

~~584~~

585 A string array of allowed tool names

~~586~~

587 - `class McpAllowedToolsMcpToolFilter: …`

~~588~~

589 A filter object to specify which tools are allowed.

~~590~~

591 - `read_only: Optional[bool]`

~~592~~

593 Indicates whether or not a tool modifies data or is read-only. If an

594 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

595 it will match this filter.

~~596~~

597 - `tool_names: Optional[List[str]]`

~~598~~

599 List of allowed tool names.

~~600~~

601 - `authorization: Optional[str]`

~~602~~

603 An OAuth access token that can be used with a remote MCP server, either

604 with a custom MCP server URL or a service connector. Your application

605 must handle the OAuth authorization flow and provide the token here.

~~606~~

607 - `connector_id: Optional[Literal["connector_dropbox", "connector_gmail", "connector_googlecalendar", 5 more]]`

~~608~~

609 Identifier for service connectors, like those available in ChatGPT. One of

610 `server_url` or `connector_id` must be provided. Learn more about service

611 connectors [here](https://platform.openai.com/docs/guides/tools-remote-mcp#connectors).

~~612~~

613 Currently supported `connector_id` values are:

~~614~~

615 - Dropbox: `connector_dropbox`

616 - Gmail: `connector_gmail`

617 - Google Calendar: `connector_googlecalendar`

618 - Google Drive: `connector_googledrive`

619 - Microsoft Teams: `connector_microsoftteams`

620 - Outlook Calendar: `connector_outlookcalendar`

621 - Outlook Email: `connector_outlookemail`

622 - SharePoint: `connector_sharepoint`

~~623~~

624 - `"connector_dropbox"`

~~625~~

626 - `"connector_gmail"`

~~627~~

628 - `"connector_googlecalendar"`

~~629~~

630 - `"connector_googledrive"`

~~631~~

632 - `"connector_microsoftteams"`

~~633~~

634 - `"connector_outlookcalendar"`

~~635~~

636 - `"connector_outlookemail"`

~~637~~

638 - `"connector_sharepoint"`

~~639~~

640 - `defer_loading: Optional[bool]`

~~641~~

642 Whether this MCP tool is deferred and discovered via tool search.

~~643~~

644 - `headers: Optional[Dict[str, str]]`

~~645~~

646 Optional HTTP headers to send to the MCP server. Use for authentication

647 or other purposes.

~~648~~

649 - `require_approval: Optional[McpRequireApproval]`

~~650~~

651 Specify which of the MCP server's tools require approval.

~~652~~

653 - `class McpRequireApprovalMcpToolApprovalFilter: …`

~~654~~

655 Specify which of the MCP server's tools require approval. Can be

656 `always`, `never`, or a filter object associated with tools

657 that require approval.

~~658~~

659 - `always: Optional[McpRequireApprovalMcpToolApprovalFilterAlways]`

~~660~~

661 A filter object to specify which tools are allowed.

~~662~~

663 - `read_only: Optional[bool]`

~~664~~

665 Indicates whether or not a tool modifies data or is read-only. If an

666 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

667 it will match this filter.

~~668~~

669 - `tool_names: Optional[List[str]]`

~~670~~

671 List of allowed tool names.

~~672~~

673 - `never: Optional[McpRequireApprovalMcpToolApprovalFilterNever]`

~~674~~

675 A filter object to specify which tools are allowed.

~~676~~

677 - `read_only: Optional[bool]`

~~678~~

679 Indicates whether or not a tool modifies data or is read-only. If an

680 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

681 it will match this filter.

~~682~~

683 - `tool_names: Optional[List[str]]`

~~684~~

685 List of allowed tool names.

~~686~~

687 - `Literal["always", "never"]`

~~688~~

689 Specify a single approval policy for all tools. One of `always` or

690 `never`. When set to `always`, all tools will require approval. When

691 set to `never`, all tools will not require approval.

~~692~~

693 - `"always"`

~~694~~

695 - `"never"`

~~696~~

697 - `server_description: Optional[str]`

~~698~~

699 Optional description of the MCP server, used to provide more context.

~~700~~

701 - `server_url: Optional[str]`

~~702~~

703 The URL for the MCP server. One of `server_url` or `connector_id` must be

704 provided.

~~705~~

706- `tracing: Optional[RealtimeTracingConfigParam]`

~~707~~

708 Realtime API can write session traces to the [Traces Dashboard](https://platform.openai.com/logs?api=traces). Set to null to disable tracing. Once

709 tracing is enabled for a session, the configuration cannot be modified.

~~710~~

711 `auto` will create a trace for the session with default values for the

712 workflow name, group id, and metadata.

~~713~~

714 - `Literal["auto"]`

~~715~~

716 Enables tracing and sets default values for tracing configuration options. Always `auto`.

~~717~~

718 - `"auto"`

~~719~~

720 - `class TracingConfiguration: …`

~~721~~

722 Granular configuration for tracing.

~~723~~

724 - `group_id: Optional[str]`

~~725~~

726 The group id to attach to this trace to enable filtering and

727 grouping in the Traces Dashboard.

~~728~~

729 - `metadata: Optional[object]`

~~730~~

731 The arbitrary metadata to attach to this trace to enable

732 filtering in the Traces Dashboard.

~~733~~

734 - `workflow_name: Optional[str]`

~~735~~

736 The name of the workflow to attach to this trace. This is used to

737 name the trace in the Traces Dashboard.

~~738~~

739- `truncation: Optional[RealtimeTruncationParam]`

~~740~~

741 When the number of tokens in a conversation exceeds the model's input token limit, the conversation be truncated, meaning messages (starting from the oldest) will not be included in the model's context. A 32k context model with 4,096 max output tokens can only include 28,224 tokens in the context before truncation occurs.

~~742~~

743 Clients can configure truncation behavior to truncate with a lower max token limit, which is an effective way to control token usage and cost.

~~744~~

745 Truncation will reduce the number of cached tokens on the next turn (busting the cache), since messages are dropped from the beginning of the context. However, clients can also configure truncation to retain messages up to a fraction of the maximum context size, which will reduce the need for future truncations and thus improve the cache rate.

~~746~~

747 Truncation can be disabled entirely, which means the server will never truncate but would instead return an error if the conversation exceeds the model's input token limit.

~~748~~

749 - `Literal["auto", "disabled"]`

~~750~~

751 The truncation strategy to use for the session. `auto` is the default truncation strategy. `disabled` will disable truncation and emit errors when the conversation exceeds the input token limit.

~~752~~

753 - `"auto"`

~~754~~

755 - `"disabled"`

~~756~~

757 - `class RealtimeTruncationRetentionRatio: …`

~~758~~

759 Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.

~~760~~

761 - `retention_ratio: float`

~~762~~

763 Fraction of post-instruction conversation tokens to retain (`0.0` - `1.0`) when the conversation exceeds the input token limit. Setting this to `0.8` means that messages will be dropped until 80% of the maximum allowed tokens are used. This helps reduce the frequency of truncations and improve cache rates.

~~764~~

765 - `type: Literal["retention_ratio"]`

~~766~~

767 Use retention ratio truncation.

~~768~~

769 - `"retention_ratio"`

~~770~~

771 - `token_limits: Optional[TokenLimits]`

~~772~~

773 Optional custom token limits for this truncation strategy. If not provided, the model's default token limits will be used.

~~774~~

775 - `post_instructions: Optional[int]`

~~776~~

777 Maximum tokens allowed in the conversation after instructions (which including tool definitions). For example, setting this to 5,000 would mean that truncation would occur when the conversation exceeds 5,000 tokens after instructions. This cannot be higher than the model's context window size minus the maximum output tokens.

~~778~~

779### Example

~~780~~

781```python

782import os

783from openai import OpenAI

~~784~~

785client = OpenAI(

786 api_key=os.environ.get("OPENAI_API_KEY"), # This is the default and can be omitted

787)

788client.realtime.calls.accept(

789 call_id="call_id",

790 type="realtime",

791)

792```

~~793~~

794## Hang up call

~~795~~

796`realtime.calls.hangup(strcall_id)`

~~797~~

798**post** `/realtime/calls/{call_id}/hangup`

~~799~~

800End an active Realtime API call, whether it was initiated over SIP or

801WebRTC.

~~802~~

803### Parameters

~~804~~

805- `call_id: str`

~~806~~

807### Example

~~808~~

809```python

810import os

811from openai import OpenAI

~~812~~

813client = OpenAI(

814 api_key=os.environ.get("OPENAI_API_KEY"), # This is the default and can be omitted

815)

816client.realtime.calls.hangup(

817 "call_id",

818)

819```

~~820~~

821## Refer call

~~822~~

823`realtime.calls.refer(strcall_id, CallReferParams**kwargs)`

~~824~~

825**post** `/realtime/calls/{call_id}/refer`

~~826~~

827Transfer an active SIP call to a new destination using the SIP REFER verb.

~~828~~

829### Parameters

~~830~~

831- `call_id: str`

~~832~~

833- `target_uri: str`

~~834~~

835 URI that should appear in the SIP Refer-To header. Supports values like

836 `tel:+14155550123` or `sip:agent@example.com`.

~~837~~

838### Example

~~839~~

840```python

841import os

842from openai import OpenAI

~~843~~

844client = OpenAI(

845 api_key=os.environ.get("OPENAI_API_KEY"), # This is the default and can be omitted

846)

847client.realtime.calls.refer(

848 call_id="call_id",

849 target_uri="tel:+14155550123",

850)

851```

~~852~~

853## Reject call

~~854~~

855`realtime.calls.reject(strcall_id, CallRejectParams**kwargs)`

~~856~~

857**post** `/realtime/calls/{call_id}/reject`

~~858~~

859Decline an incoming SIP call by returning a SIP status code to the caller.

~~860~~

861### Parameters

~~862~~

863- `call_id: str`

~~864~~

865- `status_code: Optional[int]`

~~866~~

867 SIP response code to send back to the caller. Defaults to `603` (Decline)

868 when omitted.

~~869~~

870### Example

~~871~~

872```python

873import os

874from openai import OpenAI

~~875~~

876client = OpenAI(

877 api_key=os.environ.get("OPENAI_API_KEY"), # This is the default and can be omitted

878)

879client.realtime.calls.reject(

880 call_id="call_id",

881)

882```

~~883~~

884## Create call

~~885~~

886`realtime.calls.create(CallCreateParams**kwargs) -> BinaryResponseContent`

~~887~~

888**post** `/realtime/calls`

~~889~~

890Create a new Realtime API call over WebRTC and receive the SDP answer needed

891to complete the peer connection.

~~892~~

893### Parameters

~~894~~

895- `sdp: str`

~~896~~

897 WebRTC Session Description Protocol (SDP) offer generated by the caller.

~~898~~

899- `session: Optional[RealtimeSessionCreateRequestParam]`

~~900~~

901 Realtime session object configuration.

~~902~~

903 - `type: Literal["realtime"]`

~~904~~

905 The type of session to create. Always `realtime` for the Realtime API.

~~906~~

907 - `"realtime"`

~~908~~

909 - `audio: Optional[RealtimeAudioConfig]`

~~910~~

911 Configuration for input and output audio.

~~912~~

913 - `input: Optional[RealtimeAudioConfigInput]`

~~914~~

915 - `format: Optional[RealtimeAudioFormats]`

~~916~~

917 The format of the input audio.

~~918~~

919 - `class AudioPCM: …`

~~920~~

921 The PCM audio format. Only a 24kHz sample rate is supported.

~~922~~

923 - `rate: Optional[Literal[24000]]`

~~924~~

925 The sample rate of the audio. Always `24000`.

~~926~~

927 - `24000`

~~928~~

929 - `type: Optional[Literal["audio/pcm"]]`

~~930~~

931 The audio format. Always `audio/pcm`.

~~932~~

933 - `"audio/pcm"`

~~934~~

935 - `class AudioPCMU: …`

~~936~~

937 The G.711 μ-law format.

~~938~~

939 - `type: Optional[Literal["audio/pcmu"]]`

~~940~~

941 The audio format. Always `audio/pcmu`.

~~942~~

943 - `"audio/pcmu"`

~~944~~

945 - `class AudioPCMA: …`

~~946~~

947 The G.711 A-law format.

~~948~~

949 - `type: Optional[Literal["audio/pcma"]]`

~~950~~

951 The audio format. Always `audio/pcma`.

~~952~~

953 - `"audio/pcma"`

~~954~~

955 - `noise_reduction: Optional[NoiseReduction]`

~~956~~

957 Configuration for input audio noise reduction. This can be set to `null` to turn off.

958 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.

959 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.

~~960~~

961 - `type: Optional[NoiseReductionType]`

~~962~~

963 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.

~~964~~

965 - `"near_field"`

~~966~~

967 - `"far_field"`

~~968~~

969 - `transcription: Optional[AudioTranscription]`

~~970~~

971 Configuration for input audio transcription, defaults to off and can be set to `null` to turn off once on. Input audio transcription is not native to the model, since the model consumes audio directly. Transcription runs asynchronously through [the /audio/transcriptions endpoint](https://platform.openai.com/docs/api-reference/audio/createTranscription) and should be treated as guidance of input audio content rather than precisely what the model heard. The client can optionally set the language and prompt for transcription, these offer additional guidance to the transcription service.

~~972~~

973 - `delay: Optional[Literal["minimal", "low", "medium", 2 more]]`

~~974~~

975 Controls how long the model waits before emitting transcription text.

976 Higher values can improve transcription accuracy at the cost of latency.

977 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~978~~

979 - `"minimal"`

~~980~~

981 - `"low"`

~~982~~

983 - `"medium"`

~~984~~

985 - `"high"`

~~986~~

987 - `"xhigh"`

~~988~~

989 - `language: Optional[str]`

~~990~~

991 The language of the input audio. Supplying the input language in

992 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format

993 will improve accuracy and latency.

~~994~~

995 - `model: Optional[Union[str, Literal["whisper-1", "gpt-4o-mini-transcribe", "gpt-4o-mini-transcribe-2025-12-15", 3 more], null]]`

~~996~~

997 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~998~~

999 - `str`

~~1000~~

1001 - `Literal["whisper-1", "gpt-4o-mini-transcribe", "gpt-4o-mini-transcribe-2025-12-15", 3 more]`

~~1002~~

1003 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~1004~~

1005 - `"whisper-1"`

~~1006~~

1007 - `"gpt-4o-mini-transcribe"`

~~1008~~

1009 - `"gpt-4o-mini-transcribe-2025-12-15"`

~~1010~~

1011 - `"gpt-4o-transcribe"`

~~1012~~

1013 - `"gpt-4o-transcribe-diarize"`

~~1014~~

1015 - `"gpt-realtime-whisper"`

~~1016~~

1017 - `prompt: Optional[str]`

~~1018~~

1019 An optional text to guide the model's style or continue a previous audio

1020 segment.

1021 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).

1022 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".

1023 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~1024~~

1025 - `turn_detection: Optional[RealtimeAudioInputTurnDetection]`

~~1026~~

1027 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.

~~1028~~

1029 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.

~~1030~~

1031 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.

~~1032~~

1033 For `gpt-realtime-whisper` transcription sessions, turn detection must be

1034 set to `null`; VAD is not supported.

~~1035~~

1036 - `class ServerVad: …`

~~1037~~

1038 Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.

~~1039~~

1040 - `type: Literal["server_vad"]`

~~1041~~

1042 Type of turn detection, `server_vad` to turn on simple Server VAD.

~~1043~~

1044 - `"server_vad"`

~~1045~~

1046 - `create_response: Optional[bool]`

~~1047~~

1048 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.

~~1049~~

1050 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~1051~~

1052 - `idle_timeout_ms: Optional[int]`

~~1053~~

1054 Optional timeout after which a model response will be triggered automatically. This is

1055 useful for situations in which a long pause from the user is unexpected, such as a phone

1056 call. The model will effectively prompt the user to continue the conversation based

1057 on the current context.

~~1058~~

1059 The timeout value will be applied after the last model response's audio has finished playing,

1060 i.e. it's set to the `response.done` time plus audio playback duration.

~~1061~~

1062 An `input_audio_buffer.timeout_triggered` event (plus events

1063 associated with the Response) will be emitted when the timeout is reached.

1064 Idle timeout is currently only supported for `server_vad` mode.

~~1065~~

1066 - `interrupt_response: Optional[bool]`

~~1067~~

1068 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default

1069 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.

~~1070~~

1071 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~1072~~

1073 - `prefix_padding_ms: Optional[int]`

~~1074~~

1075 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in

1076 milliseconds). Defaults to 300ms.

~~1077~~

1078 - `silence_duration_ms: Optional[int]`

~~1079~~

1080 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults

1081 to 500ms. With shorter values the model will respond more quickly,

1082 but may jump in on short pauses from the user.

~~1083~~

1084 - `threshold: Optional[float]`

~~1085~~

1086 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

1087 higher threshold will require louder audio to activate the model, and

1088 thus might perform better in noisy environments.

~~1089~~

1090 - `class SemanticVad: …`

~~1091~~

1092 Server-side semantic turn detection which uses a model to determine when the user has finished speaking.

~~1093~~

1094 - `type: Literal["semantic_vad"]`

~~1095~~

1096 Type of turn detection, `semantic_vad` to turn on Semantic VAD.

~~1097~~

1098 - `"semantic_vad"`

~~1099~~

1100 - `create_response: Optional[bool]`

~~1101~~

1102 Whether or not to automatically generate a response when a VAD stop event occurs.

~~1103~~

1104 - `eagerness: Optional[Literal["low", "medium", "high", "auto"]]`

~~1105~~

1106 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.

~~1107~~

1108 - `"low"`

~~1109~~

1110 - `"medium"`

~~1111~~

1112 - `"high"`

~~1113~~

1114 - `"auto"`

~~1115~~

1116 - `interrupt_response: Optional[bool]`

~~1117~~

1118 Whether or not to automatically interrupt any ongoing response with output to the default

1119 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.

~~1120~~

1121 - `output: Optional[RealtimeAudioConfigOutput]`

~~1122~~

1123 - `format: Optional[RealtimeAudioFormats]`

~~1124~~

1125 The format of the output audio.

~~1126~~

1127 - `speed: Optional[float]`

~~1128~~

1129 The speed of the model's spoken response as a multiple of the original speed.

1130 1.0 is the default speed. 0.25 is the minimum speed. 1.5 is the maximum speed. This value can only be changed in between model turns, not while a response is in progress.

~~1131~~

1132 This parameter is a post-processing adjustment to the audio after it is generated, it's

1133 also possible to prompt the model to speak faster or slower.

~~1134~~

1135 - `voice: Optional[Voice]`

~~1136~~

1137 The voice the model uses to respond. Supported built-in voices are

1138 `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`, `shimmer`, `verse`,

1139 `marin`, and `cedar`. You may also provide a custom voice object with

1140 an `id`, for example `{ "id": "voice_1234" }`. Voice cannot be changed

1141 during the session once the model has responded with audio at least once.

1142 We recommend `marin` and `cedar` for best quality.

~~1143~~

1144 - `str`

~~1145~~

1146 - `Literal["alloy", "ash", "ballad", 7 more]`

~~1147~~

1148 - `"alloy"`

~~1149~~

1150 - `"ash"`

~~1151~~

1152 - `"ballad"`

~~1153~~

1154 - `"coral"`

~~1155~~

1156 - `"echo"`

~~1157~~

1158 - `"sage"`

~~1159~~

1160 - `"shimmer"`

~~1161~~

1162 - `"verse"`

~~1163~~

1164 - `"marin"`

~~1165~~

1166 - `"cedar"`

~~1167~~

1168 - `class VoiceID: …`

~~1169~~

1170 Custom voice reference.

~~1171~~

1172 - `id: str`

~~1173~~

1174 The custom voice ID, e.g. `voice_1234`.

~~1175~~

1176 - `include: Optional[List[Literal["item.input_audio_transcription.logprobs"]]]`

~~1177~~

1178 Additional fields to include in server outputs.

~~1179~~

1180 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~1181~~

1182 - `"item.input_audio_transcription.logprobs"`

~~1183~~

1184 - `instructions: Optional[str]`

~~1185~~

1186 The default system instructions (i.e. system message) prepended to model calls. This field allows the client to guide the model on desired responses. The model can be instructed on response content and format, (e.g. "be extremely succinct", "act friendly", "here are examples of good responses") and on audio behavior (e.g. "talk quickly", "inject emotion into your voice", "laugh frequently"). The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behavior.

~~1187~~

1188 Note that the server sets default instructions which will be used if this field is not set and are visible in the `session.created` event at the start of the session.

~~1189~~

1190 - `max_output_tokens: Optional[Union[int, Literal["inf"], null]]`

~~1191~~

1192 Maximum number of output tokens for a single assistant response,

1193 inclusive of tool calls. Provide an integer between 1 and 4096 to

1194 limit output tokens, or `inf` for the maximum available tokens for a

1195 given model. Defaults to `inf`.

~~1196~~

1197 - `int`

~~1198~~

1199 - `Literal["inf"]`

~~1200~~

1201 - `"inf"`

~~1202~~

1203 - `model: Optional[Union[str, Literal["gpt-realtime", "gpt-realtime-1.5", "gpt-realtime-2", 14 more], null]]`

~~1204~~

1205 The Realtime model used for this session.

~~1206~~

1207 - `str`

~~1208~~

1209 - `Literal["gpt-realtime", "gpt-realtime-1.5", "gpt-realtime-2", 14 more]`

~~1210~~

1211 The Realtime model used for this session.

~~1212~~

1213 - `"gpt-realtime"`

~~1214~~

1215 - `"gpt-realtime-1.5"`

~~1216~~

1217 - `"gpt-realtime-2"`

~~1218~~

1219 - `"gpt-realtime-2025-08-28"`

~~1220~~

1221 - `"gpt-4o-realtime-preview"`

~~1222~~

1223 - `"gpt-4o-realtime-preview-2024-10-01"`

~~1224~~

1225 - `"gpt-4o-realtime-preview-2024-12-17"`

~~1226~~

1227 - `"gpt-4o-realtime-preview-2025-06-03"`

~~1228~~

1229 - `"gpt-4o-mini-realtime-preview"`

~~1230~~

1231 - `"gpt-4o-mini-realtime-preview-2024-12-17"`

~~1232~~

1233 - `"gpt-realtime-mini"`

~~1234~~

1235 - `"gpt-realtime-mini-2025-10-06"`

~~1236~~

1237 - `"gpt-realtime-mini-2025-12-15"`

~~1238~~

1239 - `"gpt-audio-1.5"`

~~1240~~

1241 - `"gpt-audio-mini"`

~~1242~~

1243 - `"gpt-audio-mini-2025-10-06"`

~~1244~~

1245 - `"gpt-audio-mini-2025-12-15"`

~~1246~~

1247 - `output_modalities: Optional[List[Literal["text", "audio"]]]`

~~1248~~

1249 The set of modalities the model can respond with. It defaults to `["audio"]`, indicating

1250 that the model will respond with audio plus a transcript. `["text"]` can be used to make

1251 the model respond with text only. It is not possible to request both `text` and `audio` at the same time.

~~1252~~

1253 - `"text"`

~~1254~~

1255 - `"audio"`

~~1256~~

1257 - `parallel_tool_calls: Optional[bool]`

~~1258~~

1259 Whether the model may call multiple tools in parallel. Only supported by

1260 reasoning Realtime models such as `gpt-realtime-2`.

~~1261~~

1262 - `prompt: Optional[ResponsePrompt]`

~~1263~~

1264 Reference to a prompt template and its variables.

1265 [Learn more](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts).

~~1266~~

1267 - `id: str`

~~1268~~

1269 The unique identifier of the prompt template to use.

~~1270~~

1271 - `variables: Optional[Dict[str, Variables]]`

~~1272~~

1273 Optional map of values to substitute in for variables in your

1274 prompt. The substitution values can either be strings, or other

1275 Response input types like images or files.

~~1276~~

1277 - `str`

~~1278~~

1279 - `class ResponseInputText: …`

~~1280~~

1281 A text input to the model.

~~1282~~

1283 - `text: str`

~~1284~~

1285 The text input to the model.

~~1286~~

1287 - `type: Literal["input_text"]`

~~1288~~

1289 The type of the input item. Always `input_text`.

~~1290~~

1291 - `"input_text"`

~~1292~~

1293 - `class ResponseInputImage: …`

~~1294~~

1295 An image input to the model. Learn about [image inputs](https://platform.openai.com/docs/guides/vision).

~~1296~~

1297 - `detail: Literal["low", "high", "auto", "original"]`

~~1298~~

1299 The detail level of the image to be sent to the model. One of `high`, `low`, `auto`, or `original`. Defaults to `auto`.

~~1300~~

1301 - `"low"`

~~1302~~

1303 - `"high"`

~~1304~~

1305 - `"auto"`

~~1306~~

1307 - `"original"`

~~1308~~

1309 - `type: Literal["input_image"]`

~~1310~~

1311 The type of the input item. Always `input_image`.

~~1312~~

1313 - `"input_image"`

~~1314~~

1315 - `file_id: Optional[str]`

~~1316~~

1317 The ID of the file to be sent to the model.

~~1318~~

1319 - `image_url: Optional[str]`

~~1320~~

1321 The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.

~~1322~~

1323 - `class ResponseInputFile: …`

~~1324~~

1325 A file input to the model.

~~1326~~

1327 - `type: Literal["input_file"]`

~~1328~~

1329 The type of the input item. Always `input_file`.

~~1330~~

1331 - `"input_file"`

~~1332~~

1333 - `detail: Optional[Literal["low", "high"]]`

~~1334~~

1335 The detail level of the file to be sent to the model. Use `low` for the default rendering behavior, or `high` to render the file at higher quality. Defaults to `low`.

~~1336~~

1337 - `"low"`

~~1338~~

1339 - `"high"`

~~1340~~

1341 - `file_data: Optional[str]`

~~1342~~

1343 The content of the file to be sent to the model.

~~1344~~

1345 - `file_id: Optional[str]`

~~1346~~

1347 The ID of the file to be sent to the model.

~~1348~~

1349 - `file_url: Optional[str]`

~~1350~~

1351 The URL of the file to be sent to the model.

~~1352~~

1353 - `filename: Optional[str]`

~~1354~~

1355 The name of the file to be sent to the model.

~~1356~~

1357 - `version: Optional[str]`

~~1358~~

1359 Optional version of the prompt template.

~~1360~~

1361 - `reasoning: Optional[RealtimeReasoning]`

~~1362~~

1363 Configuration for reasoning-capable Realtime models such as `gpt-realtime-2`.

~~1364~~

1365 - `effort: Optional[RealtimeReasoningEffort]`

~~1366~~

1367 Constrains effort on reasoning for reasoning-capable Realtime models such as

1368 `gpt-realtime-2`.

~~1369~~

1370 - `"minimal"`

~~1371~~

1372 - `"low"`

~~1373~~

1374 - `"medium"`

~~1375~~

1376 - `"high"`

~~1377~~

1378 - `"xhigh"`

~~1379~~

1380 - `tool_choice: Optional[RealtimeToolChoiceConfig]`

~~1381~~

1382 How the model chooses tools. Provide one of the string modes or force a specific

1383 function/MCP tool.

~~1384~~

1385 - `Literal["none", "auto", "required"]`

~~1386~~

1387 - `"none"`

~~1388~~

1389 - `"auto"`

~~1390~~

1391 - `"required"`

~~1392~~

1393 - `class ToolChoiceFunction: …`

~~1394~~

1395 Use this option to force the model to call a specific function.

~~1396~~

1397 - `name: str`

~~1398~~

1399 The name of the function to call.

~~1400~~

1401 - `type: Literal["function"]`

~~1402~~

1403 For function calling, the type is always `function`.

~~1404~~

1405 - `"function"`

~~1406~~

1407 - `class ToolChoiceMcp: …`

~~1408~~

1409 Use this option to force the model to call a specific tool on a remote MCP server.

~~1410~~

1411 - `server_label: str`

~~1412~~

1413 The label of the MCP server to use.

~~1414~~

1415 - `type: Literal["mcp"]`

~~1416~~

1417 For MCP tools, the type is always `mcp`.

~~1418~~

1419 - `"mcp"`

~~1420~~

1421 - `name: Optional[str]`

~~1422~~

1423 The name of the tool to call on the server.

~~1424~~

1425 - `tools: Optional[RealtimeToolsConfig]`

~~1426~~

1427 Tools available to the model.

~~1428~~

1429 - `class RealtimeFunctionTool: …`

~~1430~~

1431 - `description: Optional[str]`

~~1432~~

1433 The description of the function, including guidance on when and how

1434 to call it, and guidance about what to tell the user when calling

1435 (if anything).

~~1436~~

1437 - `name: Optional[str]`

~~1438~~

1439 The name of the function.

~~1440~~

1441 - `parameters: Optional[object]`

~~1442~~

1443 Parameters of the function in JSON Schema.

~~1444~~

1445 - `type: Optional[Literal["function"]]`

~~1446~~

1447 The type of the tool, i.e. `function`.

~~1448~~

1449 - `"function"`

~~1450~~

1451 - `class Mcp: …`

~~1452~~

1453 Give the model access to additional tools via remote Model Context Protocol

1454 (MCP) servers. [Learn more about MCP](https://platform.openai.com/docs/guides/tools-remote-mcp).

~~1455~~

1456 - `server_label: str`

~~1457~~

1458 A label for this MCP server, used to identify it in tool calls.

~~1459~~

1460 - `type: Literal["mcp"]`

~~1461~~

1462 The type of the MCP tool. Always `mcp`.

~~1463~~

1464 - `"mcp"`

~~1465~~

1466 - `allowed_tools: Optional[McpAllowedTools]`

~~1467~~

1468 List of allowed tool names or a filter object.

~~1469~~

1470 - `List[str]`

~~1471~~

1472 A string array of allowed tool names

~~1473~~

1474 - `class McpAllowedToolsMcpToolFilter: …`

~~1475~~

1476 A filter object to specify which tools are allowed.

~~1477~~

1478 - `read_only: Optional[bool]`

~~1479~~

1480 Indicates whether or not a tool modifies data or is read-only. If an

1481 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

1482 it will match this filter.

~~1483~~

1484 - `tool_names: Optional[List[str]]`

~~1485~~

1486 List of allowed tool names.

~~1487~~

1488 - `authorization: Optional[str]`

~~1489~~

1490 An OAuth access token that can be used with a remote MCP server, either

1491 with a custom MCP server URL or a service connector. Your application

1492 must handle the OAuth authorization flow and provide the token here.

~~1493~~

1494 - `connector_id: Optional[Literal["connector_dropbox", "connector_gmail", "connector_googlecalendar", 5 more]]`

~~1495~~

1496 Identifier for service connectors, like those available in ChatGPT. One of

1497 `server_url` or `connector_id` must be provided. Learn more about service

1498 connectors [here](https://platform.openai.com/docs/guides/tools-remote-mcp#connectors).

~~1499~~

1500 Currently supported `connector_id` values are:

~~1501~~

1502 - Dropbox: `connector_dropbox`

1503 - Gmail: `connector_gmail`

1504 - Google Calendar: `connector_googlecalendar`

1505 - Google Drive: `connector_googledrive`

1506 - Microsoft Teams: `connector_microsoftteams`

1507 - Outlook Calendar: `connector_outlookcalendar`

1508 - Outlook Email: `connector_outlookemail`

1509 - SharePoint: `connector_sharepoint`

~~1510~~

1511 - `"connector_dropbox"`

~~1512~~

1513 - `"connector_gmail"`

~~1514~~

1515 - `"connector_googlecalendar"`

~~1516~~

1517 - `"connector_googledrive"`

~~1518~~

1519 - `"connector_microsoftteams"`

~~1520~~

1521 - `"connector_outlookcalendar"`

~~1522~~

1523 - `"connector_outlookemail"`

~~1524~~

1525 - `"connector_sharepoint"`

~~1526~~

1527 - `defer_loading: Optional[bool]`

~~1528~~

1529 Whether this MCP tool is deferred and discovered via tool search.

~~1530~~

1531 - `headers: Optional[Dict[str, str]]`

~~1532~~

1533 Optional HTTP headers to send to the MCP server. Use for authentication

1534 or other purposes.

~~1535~~

1536 - `require_approval: Optional[McpRequireApproval]`

~~1537~~

1538 Specify which of the MCP server's tools require approval.

~~1539~~

1540 - `class McpRequireApprovalMcpToolApprovalFilter: …`

~~1541~~

1542 Specify which of the MCP server's tools require approval. Can be

1543 `always`, `never`, or a filter object associated with tools

1544 that require approval.

~~1545~~

1546 - `always: Optional[McpRequireApprovalMcpToolApprovalFilterAlways]`

~~1547~~

1548 A filter object to specify which tools are allowed.

~~1549~~

1550 - `read_only: Optional[bool]`

~~1551~~

1552 Indicates whether or not a tool modifies data or is read-only. If an

1553 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

1554 it will match this filter.

~~1555~~

1556 - `tool_names: Optional[List[str]]`

~~1557~~

1558 List of allowed tool names.

~~1559~~

1560 - `never: Optional[McpRequireApprovalMcpToolApprovalFilterNever]`

~~1561~~

1562 A filter object to specify which tools are allowed.

~~1563~~

1564 - `read_only: Optional[bool]`

~~1565~~

1566 Indicates whether or not a tool modifies data or is read-only. If an

1567 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

1568 it will match this filter.

~~1569~~

1570 - `tool_names: Optional[List[str]]`

~~1571~~

1572 List of allowed tool names.

~~1573~~

1574 - `Literal["always", "never"]`

~~1575~~

1576 Specify a single approval policy for all tools. One of `always` or

1577 `never`. When set to `always`, all tools will require approval. When

1578 set to `never`, all tools will not require approval.

~~1579~~

1580 - `"always"`

~~1581~~

1582 - `"never"`

~~1583~~

1584 - `server_description: Optional[str]`

~~1585~~

1586 Optional description of the MCP server, used to provide more context.

~~1587~~

1588 - `server_url: Optional[str]`

~~1589~~

1590 The URL for the MCP server. One of `server_url` or `connector_id` must be

1591 provided.

~~1592~~

1593 - `tracing: Optional[RealtimeTracingConfig]`

~~1594~~

1595 Realtime API can write session traces to the [Traces Dashboard](https://platform.openai.com/logs?api=traces). Set to null to disable tracing. Once

1596 tracing is enabled for a session, the configuration cannot be modified.

~~1597~~

1598 `auto` will create a trace for the session with default values for the

1599 workflow name, group id, and metadata.

~~1600~~

1601 - `Literal["auto"]`

~~1602~~

1603 Enables tracing and sets default values for tracing configuration options. Always `auto`.

~~1604~~

1605 - `"auto"`

~~1606~~

1607 - `class TracingConfiguration: …`

~~1608~~

1609 Granular configuration for tracing.

~~1610~~

1611 - `group_id: Optional[str]`

~~1612~~

1613 The group id to attach to this trace to enable filtering and

1614 grouping in the Traces Dashboard.

~~1615~~

1616 - `metadata: Optional[object]`

~~1617~~

1618 The arbitrary metadata to attach to this trace to enable

1619 filtering in the Traces Dashboard.

~~1620~~

1621 - `workflow_name: Optional[str]`

~~1622~~

1623 The name of the workflow to attach to this trace. This is used to

1624 name the trace in the Traces Dashboard.

~~1625~~

1626 - `truncation: Optional[RealtimeTruncation]`

~~1627~~

1628 When the number of tokens in a conversation exceeds the model's input token limit, the conversation be truncated, meaning messages (starting from the oldest) will not be included in the model's context. A 32k context model with 4,096 max output tokens can only include 28,224 tokens in the context before truncation occurs.

~~1629~~

1630 Clients can configure truncation behavior to truncate with a lower max token limit, which is an effective way to control token usage and cost.

~~1631~~

1632 Truncation will reduce the number of cached tokens on the next turn (busting the cache), since messages are dropped from the beginning of the context. However, clients can also configure truncation to retain messages up to a fraction of the maximum context size, which will reduce the need for future truncations and thus improve the cache rate.

~~1633~~

1634 Truncation can be disabled entirely, which means the server will never truncate but would instead return an error if the conversation exceeds the model's input token limit.

~~1635~~

1636 - `Literal["auto", "disabled"]`

~~1637~~

1638 The truncation strategy to use for the session. `auto` is the default truncation strategy. `disabled` will disable truncation and emit errors when the conversation exceeds the input token limit.

~~1639~~

1640 - `"auto"`

~~1641~~

1642 - `"disabled"`

~~1643~~

1644 - `class RealtimeTruncationRetentionRatio: …`

~~1645~~

1646 Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.

~~1647~~

1648 - `retention_ratio: float`

~~1649~~

1650 Fraction of post-instruction conversation tokens to retain (`0.0` - `1.0`) when the conversation exceeds the input token limit. Setting this to `0.8` means that messages will be dropped until 80% of the maximum allowed tokens are used. This helps reduce the frequency of truncations and improve cache rates.

~~1651~~

1652 - `type: Literal["retention_ratio"]`

~~1653~~

1654 Use retention ratio truncation.

~~1655~~

1656 - `"retention_ratio"`

~~1657~~

1658 - `token_limits: Optional[TokenLimits]`

~~1659~~

1660 Optional custom token limits for this truncation strategy. If not provided, the model's default token limits will be used.

~~1661~~

1662 - `post_instructions: Optional[int]`

~~1663~~

1664 Maximum tokens allowed in the conversation after instructions (which including tool definitions). For example, setting this to 5,000 would mean that truncation would occur when the conversation exceeds 5,000 tokens after instructions. This cannot be higher than the model's context window size minus the maximum output tokens.

~~1665~~

1666### Returns

~~1667~~

1668- `BinaryResponseContent`

~~1669~~

1670### Example

~~1671~~

1672```python

1673import os

1674from openai import OpenAI

~~1675~~

1676client = OpenAI(

1677 api_key=os.environ.get("OPENAI_API_KEY"), # This is the default and can be omitted

1678)

1679call = client.realtime.calls.create(

1680 sdp="sdp",

1681)

1682print(call)

1683content = call.read()

1684print(content)

1685```