Go Premium Account

Spybara
Companies
Openai
Api
Reference Changes, 2026-05-18 22:01 UTC to 2026-05-19 06:34 UTC
python/resources/realtime/subresources/client_secrets/index.md

python/resources/realtime/subresources/client_secrets/index.md 2026-05-18 22:01 UTC to 2026-05-19 06:34 UTC

0 added, 3749 removed.

2026

Wed 27 06:42 Fri 22 06:33 Wed 20 06:35 Tue 19 06:34 Mon 18 22:01 Mon 11 18:00 Thu 7 21:57 Tue 5 23:00 Sat 2 05:57

This document has no rendered page for this history range.

python/resources/realtime/subresources/client_secrets/index.md +0 −3749 deleted

File Deleted View Diff

~~1# Client Secrets~~

~~3## Create client secret~~

~~5`realtime.client_secrets.create(ClientSecretCreateParams**kwargs) -> ClientSecretCreateResponse`~~

~~7**post** `/realtime/client_secrets`~~

~~9Create a Realtime client secret with an associated session configuration.~~

~~11Client secrets are short-lived tokens that can be passed to a client app,~~

~~12such as a web frontend or mobile client, which grants access to the Realtime API without~~

~~13leaking your main API key. You can configure a custom TTL for each client secret.~~

~~15You can also attach session configuration options to the client secret, which will be~~

~~16applied to any sessions created using that client secret, but these can also be overridden~~

~~17by the client connection.~~

~~19[Learn more about authentication with client secrets over WebRTC](https://platform.openai.com/docs/guides/realtime-webrtc).~~

~~21Returns the created client secret and the effective session object. The client secret is a string that looks like `ek_1234`.~~

~~23### Parameters~~

~~25- `expires_after: Optional[ExpiresAfter]`~~

~~27 Configuration for the client secret expiration. Expiration refers to the time after which~~

~~28 a client secret will no longer be valid for creating sessions. The session itself may~~

~~29 continue after that time once started. A secret can be used to create multiple sessions~~

~~30 until it expires.~~

~~32 - `anchor: Optional[Literal["created_at"]]`~~

34 The anchor point for the client secret expiration, meaning that `seconds` will be added to the `created_at` time of the client secret to produce an expiration timestamp. Only `created_at` is currently supported.

~~36 - `"created_at"`~~

~~38 - `seconds: Optional[int]`~~

~~40 The number of seconds from the anchor point to the expiration. Select a value between `10` and `7200` (2 hours). This default to 600 seconds (10 minutes) if not specified.~~

~~42- `session: Optional[Session]`~~

~~44 Session configuration to use for the client secret. Choose either a realtime~~

~~45 session or a transcription session.~~

~~47 - `class RealtimeSessionCreateRequest: …`~~

~~49 Realtime session object configuration.~~

~~51 - `type: Literal["realtime"]`~~

~~53 The type of session to create. Always `realtime` for the Realtime API.~~

~~55 - `"realtime"`~~

~~57 - `audio: Optional[RealtimeAudioConfig]`~~

~~59 Configuration for input and output audio.~~

~~61 - `input: Optional[RealtimeAudioConfigInput]`~~

~~63 - `format: Optional[RealtimeAudioFormats]`~~

~~65 The format of the input audio.~~

~~67 - `class AudioPCM: …`~~

~~69 The PCM audio format. Only a 24kHz sample rate is supported.~~

~~71 - `rate: Optional[Literal[24000]]`~~

~~73 The sample rate of the audio. Always `24000`.~~

~~75 - `24000`~~

~~77 - `type: Optional[Literal["audio/pcm"]]`~~

~~79 The audio format. Always `audio/pcm`.~~

~~81 - `"audio/pcm"`~~

~~83 - `class AudioPCMU: …`~~

~~85 The G.711 μ-law format.~~

~~87 - `type: Optional[Literal["audio/pcmu"]]`~~

~~89 The audio format. Always `audio/pcmu`.~~

~~91 - `"audio/pcmu"`~~

~~93 - `class AudioPCMA: …`~~

~~95 The G.711 A-law format.~~

~~97 - `type: Optional[Literal["audio/pcma"]]`~~

~~99 The audio format. Always `audio/pcma`.~~

~~100~~

101 - `"audio/pcma"`

~~102~~

103 - `noise_reduction: Optional[NoiseReduction]`

~~104~~

105 Configuration for input audio noise reduction. This can be set to `null` to turn off.

106 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.

107 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.

~~108~~

109 - `type: Optional[NoiseReductionType]`

~~110~~

111 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.

~~112~~

113 - `"near_field"`

~~114~~

115 - `"far_field"`

~~116~~

117 - `transcription: Optional[AudioTranscription]`

~~118~~

119 Configuration for input audio transcription, defaults to off and can be set to `null` to turn off once on. Input audio transcription is not native to the model, since the model consumes audio directly. Transcription runs asynchronously through [the /audio/transcriptions endpoint](https://platform.openai.com/docs/api-reference/audio/createTranscription) and should be treated as guidance of input audio content rather than precisely what the model heard. The client can optionally set the language and prompt for transcription, these offer additional guidance to the transcription service.

~~120~~

121 - `delay: Optional[Literal["minimal", "low", "medium", 2 more]]`

~~122~~

123 Controls how long the model waits before emitting transcription text.

124 Higher values can improve transcription accuracy at the cost of latency.

125 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~126~~

127 - `"minimal"`

~~128~~

129 - `"low"`

~~130~~

131 - `"medium"`

~~132~~

133 - `"high"`

~~134~~

135 - `"xhigh"`

~~136~~

137 - `language: Optional[str]`

~~138~~

139 The language of the input audio. Supplying the input language in

140 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format

141 will improve accuracy and latency.

~~142~~

143 - `model: Optional[Union[str, Literal["whisper-1", "gpt-4o-mini-transcribe", "gpt-4o-mini-transcribe-2025-12-15", 3 more], null]]`

~~144~~

145 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~146~~

147 - `str`

~~148~~

149 - `Literal["whisper-1", "gpt-4o-mini-transcribe", "gpt-4o-mini-transcribe-2025-12-15", 3 more]`

~~150~~

151 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~152~~

153 - `"whisper-1"`

~~154~~

155 - `"gpt-4o-mini-transcribe"`

~~156~~

157 - `"gpt-4o-mini-transcribe-2025-12-15"`

~~158~~

159 - `"gpt-4o-transcribe"`

~~160~~

161 - `"gpt-4o-transcribe-diarize"`

~~162~~

163 - `"gpt-realtime-whisper"`

~~164~~

165 - `prompt: Optional[str]`

~~166~~

167 An optional text to guide the model's style or continue a previous audio

168 segment.

169 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).

170 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".

171 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~172~~

173 - `turn_detection: Optional[RealtimeAudioInputTurnDetection]`

~~174~~

175 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.

~~176~~

177 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.

~~178~~

179 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.

~~180~~

181 For `gpt-realtime-whisper` transcription sessions, turn detection must be

182 set to `null`; VAD is not supported.

~~183~~

184 - `class ServerVad: …`

~~185~~

186 Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.

~~187~~

188 - `type: Literal["server_vad"]`

~~189~~

190 Type of turn detection, `server_vad` to turn on simple Server VAD.

~~191~~

192 - `"server_vad"`

~~193~~

194 - `create_response: Optional[bool]`

~~195~~

196 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.

~~197~~

198 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~199~~

200 - `idle_timeout_ms: Optional[int]`

~~201~~

202 Optional timeout after which a model response will be triggered automatically. This is

203 useful for situations in which a long pause from the user is unexpected, such as a phone

204 call. The model will effectively prompt the user to continue the conversation based

205 on the current context.

~~206~~

207 The timeout value will be applied after the last model response's audio has finished playing,

208 i.e. it's set to the `response.done` time plus audio playback duration.

~~209~~

210 An `input_audio_buffer.timeout_triggered` event (plus events

211 associated with the Response) will be emitted when the timeout is reached.

212 Idle timeout is currently only supported for `server_vad` mode.

~~213~~

214 - `interrupt_response: Optional[bool]`

~~215~~

216 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default

217 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.

~~218~~

219 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~220~~

221 - `prefix_padding_ms: Optional[int]`

~~222~~

223 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in

224 milliseconds). Defaults to 300ms.

~~225~~

226 - `silence_duration_ms: Optional[int]`

~~227~~

228 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults

229 to 500ms. With shorter values the model will respond more quickly,

230 but may jump in on short pauses from the user.

~~231~~

232 - `threshold: Optional[float]`

~~233~~

234 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

235 higher threshold will require louder audio to activate the model, and

236 thus might perform better in noisy environments.

~~237~~

238 - `class SemanticVad: …`

~~239~~

240 Server-side semantic turn detection which uses a model to determine when the user has finished speaking.

~~241~~

242 - `type: Literal["semantic_vad"]`

~~243~~

244 Type of turn detection, `semantic_vad` to turn on Semantic VAD.

~~245~~

246 - `"semantic_vad"`

~~247~~

248 - `create_response: Optional[bool]`

~~249~~

250 Whether or not to automatically generate a response when a VAD stop event occurs.

~~251~~

252 - `eagerness: Optional[Literal["low", "medium", "high", "auto"]]`

~~253~~

254 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.

~~255~~

256 - `"low"`

~~257~~

258 - `"medium"`

~~259~~

260 - `"high"`

~~261~~

262 - `"auto"`

~~263~~

264 - `interrupt_response: Optional[bool]`

~~265~~

266 Whether or not to automatically interrupt any ongoing response with output to the default

267 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.

~~268~~

269 - `output: Optional[RealtimeAudioConfigOutput]`

~~270~~

271 - `format: Optional[RealtimeAudioFormats]`

~~272~~

273 The format of the output audio.

~~274~~

275 - `speed: Optional[float]`

~~276~~

277 The speed of the model's spoken response as a multiple of the original speed.

278 1.0 is the default speed. 0.25 is the minimum speed. 1.5 is the maximum speed. This value can only be changed in between model turns, not while a response is in progress.

~~279~~

280 This parameter is a post-processing adjustment to the audio after it is generated, it's

281 also possible to prompt the model to speak faster or slower.

~~282~~

283 - `voice: Optional[Voice]`

~~284~~

285 The voice the model uses to respond. Supported built-in voices are

286 `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`, `shimmer`, `verse`,

287 `marin`, and `cedar`. You may also provide a custom voice object with

288 an `id`, for example `{ "id": "voice_1234" }`. Voice cannot be changed

289 during the session once the model has responded with audio at least once.

290 We recommend `marin` and `cedar` for best quality.

~~291~~

292 - `str`

~~293~~

294 - `Literal["alloy", "ash", "ballad", 7 more]`

~~295~~

296 - `"alloy"`

~~297~~

298 - `"ash"`

~~299~~

300 - `"ballad"`

~~301~~

302 - `"coral"`

~~303~~

304 - `"echo"`

~~305~~

306 - `"sage"`

~~307~~

308 - `"shimmer"`

~~309~~

310 - `"verse"`

~~311~~

312 - `"marin"`

~~313~~

314 - `"cedar"`

~~315~~

316 - `class VoiceID: …`

~~317~~

318 Custom voice reference.

~~319~~

320 - `id: str`

~~321~~

322 The custom voice ID, e.g. `voice_1234`.

~~323~~

324 - `include: Optional[List[Literal["item.input_audio_transcription.logprobs"]]]`

~~325~~

326 Additional fields to include in server outputs.

~~327~~

328 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~329~~

330 - `"item.input_audio_transcription.logprobs"`

~~331~~

332 - `instructions: Optional[str]`

~~333~~

334 The default system instructions (i.e. system message) prepended to model calls. This field allows the client to guide the model on desired responses. The model can be instructed on response content and format, (e.g. "be extremely succinct", "act friendly", "here are examples of good responses") and on audio behavior (e.g. "talk quickly", "inject emotion into your voice", "laugh frequently"). The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behavior.

~~335~~

336 Note that the server sets default instructions which will be used if this field is not set and are visible in the `session.created` event at the start of the session.

~~337~~

338 - `max_output_tokens: Optional[Union[int, Literal["inf"], null]]`

~~339~~

340 Maximum number of output tokens for a single assistant response,

341 inclusive of tool calls. Provide an integer between 1 and 4096 to

342 limit output tokens, or `inf` for the maximum available tokens for a

343 given model. Defaults to `inf`.

~~344~~

345 - `int`

~~346~~

347 - `Literal["inf"]`

~~348~~

349 - `"inf"`

~~350~~

351 - `model: Optional[Union[str, Literal["gpt-realtime", "gpt-realtime-1.5", "gpt-realtime-2", 14 more], null]]`

~~352~~

353 The Realtime model used for this session.

~~354~~

355 - `str`

~~356~~

357 - `Literal["gpt-realtime", "gpt-realtime-1.5", "gpt-realtime-2", 14 more]`

~~358~~

359 The Realtime model used for this session.

~~360~~

361 - `"gpt-realtime"`

~~362~~

363 - `"gpt-realtime-1.5"`

~~364~~

365 - `"gpt-realtime-2"`

~~366~~

367 - `"gpt-realtime-2025-08-28"`

~~368~~

369 - `"gpt-4o-realtime-preview"`

~~370~~

371 - `"gpt-4o-realtime-preview-2024-10-01"`

~~372~~

373 - `"gpt-4o-realtime-preview-2024-12-17"`

~~374~~

375 - `"gpt-4o-realtime-preview-2025-06-03"`

~~376~~

377 - `"gpt-4o-mini-realtime-preview"`

~~378~~

379 - `"gpt-4o-mini-realtime-preview-2024-12-17"`

~~380~~

381 - `"gpt-realtime-mini"`

~~382~~

383 - `"gpt-realtime-mini-2025-10-06"`

~~384~~

385 - `"gpt-realtime-mini-2025-12-15"`

~~386~~

387 - `"gpt-audio-1.5"`

~~388~~

389 - `"gpt-audio-mini"`

~~390~~

391 - `"gpt-audio-mini-2025-10-06"`

~~392~~

393 - `"gpt-audio-mini-2025-12-15"`

~~394~~

395 - `output_modalities: Optional[List[Literal["text", "audio"]]]`

~~396~~

397 The set of modalities the model can respond with. It defaults to `["audio"]`, indicating

398 that the model will respond with audio plus a transcript. `["text"]` can be used to make

399 the model respond with text only. It is not possible to request both `text` and `audio` at the same time.

~~400~~

401 - `"text"`

~~402~~

403 - `"audio"`

~~404~~

405 - `parallel_tool_calls: Optional[bool]`

~~406~~

407 Whether the model may call multiple tools in parallel. Only supported by

408 reasoning Realtime models such as `gpt-realtime-2`.

~~409~~

410 - `prompt: Optional[ResponsePrompt]`

~~411~~

412 Reference to a prompt template and its variables.

413 [Learn more](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts).

~~414~~

415 - `id: str`

~~416~~

417 The unique identifier of the prompt template to use.

~~418~~

419 - `variables: Optional[Dict[str, Variables]]`

~~420~~

421 Optional map of values to substitute in for variables in your

422 prompt. The substitution values can either be strings, or other

423 Response input types like images or files.

~~424~~

425 - `str`

~~426~~

427 - `class ResponseInputText: …`

~~428~~

429 A text input to the model.

~~430~~

431 - `text: str`

~~432~~

433 The text input to the model.

~~434~~

435 - `type: Literal["input_text"]`

~~436~~

437 The type of the input item. Always `input_text`.

~~438~~

439 - `"input_text"`

~~440~~

441 - `class ResponseInputImage: …`

~~442~~

443 An image input to the model. Learn about [image inputs](https://platform.openai.com/docs/guides/vision).

~~444~~

445 - `detail: Literal["low", "high", "auto", "original"]`

~~446~~

447 The detail level of the image to be sent to the model. One of `high`, `low`, `auto`, or `original`. Defaults to `auto`.

~~448~~

449 - `"low"`

~~450~~

451 - `"high"`

~~452~~

453 - `"auto"`

~~454~~

455 - `"original"`

~~456~~

457 - `type: Literal["input_image"]`

~~458~~

459 The type of the input item. Always `input_image`.

~~460~~

461 - `"input_image"`

~~462~~

463 - `file_id: Optional[str]`

~~464~~

465 The ID of the file to be sent to the model.

~~466~~

467 - `image_url: Optional[str]`

~~468~~

469 The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.

~~470~~

471 - `class ResponseInputFile: …`

~~472~~

473 A file input to the model.

~~474~~

475 - `type: Literal["input_file"]`

~~476~~

477 The type of the input item. Always `input_file`.

~~478~~

479 - `"input_file"`

~~480~~

481 - `detail: Optional[Literal["low", "high"]]`

~~482~~

483 The detail level of the file to be sent to the model. Use `low` for the default rendering behavior, or `high` to render the file at higher quality. Defaults to `low`.

~~484~~

485 - `"low"`

~~486~~

487 - `"high"`

~~488~~

489 - `file_data: Optional[str]`

~~490~~

491 The content of the file to be sent to the model.

~~492~~

493 - `file_id: Optional[str]`

~~494~~

495 The ID of the file to be sent to the model.

~~496~~

497 - `file_url: Optional[str]`

~~498~~

499 The URL of the file to be sent to the model.

~~500~~

501 - `filename: Optional[str]`

~~502~~

503 The name of the file to be sent to the model.

~~504~~

505 - `version: Optional[str]`

~~506~~

507 Optional version of the prompt template.

~~508~~

509 - `reasoning: Optional[RealtimeReasoning]`

~~510~~

511 Configuration for reasoning-capable Realtime models such as `gpt-realtime-2`.

~~512~~

513 - `effort: Optional[RealtimeReasoningEffort]`

~~514~~

515 Constrains effort on reasoning for reasoning-capable Realtime models such as

516 `gpt-realtime-2`.

~~517~~

518 - `"minimal"`

~~519~~

520 - `"low"`

~~521~~

522 - `"medium"`

~~523~~

524 - `"high"`

~~525~~

526 - `"xhigh"`

~~527~~

528 - `tool_choice: Optional[RealtimeToolChoiceConfig]`

~~529~~

530 How the model chooses tools. Provide one of the string modes or force a specific

531 function/MCP tool.

~~532~~

533 - `Literal["none", "auto", "required"]`

~~534~~

535 - `"none"`

~~536~~

537 - `"auto"`

~~538~~

539 - `"required"`

~~540~~

541 - `class ToolChoiceFunction: …`

~~542~~

543 Use this option to force the model to call a specific function.

~~544~~

545 - `name: str`

~~546~~

547 The name of the function to call.

~~548~~

549 - `type: Literal["function"]`

~~550~~

551 For function calling, the type is always `function`.

~~552~~

553 - `"function"`

~~554~~

555 - `class ToolChoiceMcp: …`

~~556~~

557 Use this option to force the model to call a specific tool on a remote MCP server.

~~558~~

559 - `server_label: str`

~~560~~

561 The label of the MCP server to use.

~~562~~

563 - `type: Literal["mcp"]`

~~564~~

565 For MCP tools, the type is always `mcp`.

~~566~~

567 - `"mcp"`

~~568~~

569 - `name: Optional[str]`

~~570~~

571 The name of the tool to call on the server.

~~572~~

573 - `tools: Optional[RealtimeToolsConfig]`

~~574~~

575 Tools available to the model.

~~576~~

577 - `class RealtimeFunctionTool: …`

~~578~~

579 - `description: Optional[str]`

~~580~~

581 The description of the function, including guidance on when and how

582 to call it, and guidance about what to tell the user when calling

583 (if anything).

~~584~~

585 - `name: Optional[str]`

~~586~~

587 The name of the function.

~~588~~

589 - `parameters: Optional[object]`

~~590~~

591 Parameters of the function in JSON Schema.

~~592~~

593 - `type: Optional[Literal["function"]]`

~~594~~

595 The type of the tool, i.e. `function`.

~~596~~

597 - `"function"`

~~598~~

599 - `class Mcp: …`

~~600~~

601 Give the model access to additional tools via remote Model Context Protocol

602 (MCP) servers. [Learn more about MCP](https://platform.openai.com/docs/guides/tools-remote-mcp).

~~603~~

604 - `server_label: str`

~~605~~

606 A label for this MCP server, used to identify it in tool calls.

~~607~~

608 - `type: Literal["mcp"]`

~~609~~

610 The type of the MCP tool. Always `mcp`.

~~611~~

612 - `"mcp"`

~~613~~

614 - `allowed_tools: Optional[McpAllowedTools]`

~~615~~

616 List of allowed tool names or a filter object.

~~617~~

618 - `List[str]`

~~619~~

620 A string array of allowed tool names

~~621~~

622 - `class McpAllowedToolsMcpToolFilter: …`

~~623~~

624 A filter object to specify which tools are allowed.

~~625~~

626 - `read_only: Optional[bool]`

~~627~~

628 Indicates whether or not a tool modifies data or is read-only. If an

629 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

630 it will match this filter.

~~631~~

632 - `tool_names: Optional[List[str]]`

~~633~~

634 List of allowed tool names.

~~635~~

636 - `authorization: Optional[str]`

~~637~~

638 An OAuth access token that can be used with a remote MCP server, either

639 with a custom MCP server URL or a service connector. Your application

640 must handle the OAuth authorization flow and provide the token here.

~~641~~

642 - `connector_id: Optional[Literal["connector_dropbox", "connector_gmail", "connector_googlecalendar", 5 more]]`

~~643~~

644 Identifier for service connectors, like those available in ChatGPT. One of

645 `server_url` or `connector_id` must be provided. Learn more about service

646 connectors [here](https://platform.openai.com/docs/guides/tools-remote-mcp#connectors).

~~647~~

648 Currently supported `connector_id` values are:

~~649~~

650 - Dropbox: `connector_dropbox`

651 - Gmail: `connector_gmail`

652 - Google Calendar: `connector_googlecalendar`

653 - Google Drive: `connector_googledrive`

654 - Microsoft Teams: `connector_microsoftteams`

655 - Outlook Calendar: `connector_outlookcalendar`

656 - Outlook Email: `connector_outlookemail`

657 - SharePoint: `connector_sharepoint`

~~658~~

659 - `"connector_dropbox"`

~~660~~

661 - `"connector_gmail"`

~~662~~

663 - `"connector_googlecalendar"`

~~664~~

665 - `"connector_googledrive"`

~~666~~

667 - `"connector_microsoftteams"`

~~668~~

669 - `"connector_outlookcalendar"`

~~670~~

671 - `"connector_outlookemail"`

~~672~~

673 - `"connector_sharepoint"`

~~674~~

675 - `defer_loading: Optional[bool]`

~~676~~

677 Whether this MCP tool is deferred and discovered via tool search.

~~678~~

679 - `headers: Optional[Dict[str, str]]`

~~680~~

681 Optional HTTP headers to send to the MCP server. Use for authentication

682 or other purposes.

~~683~~

684 - `require_approval: Optional[McpRequireApproval]`

~~685~~

686 Specify which of the MCP server's tools require approval.

~~687~~

688 - `class McpRequireApprovalMcpToolApprovalFilter: …`

~~689~~

690 Specify which of the MCP server's tools require approval. Can be

691 `always`, `never`, or a filter object associated with tools

692 that require approval.

~~693~~

694 - `always: Optional[McpRequireApprovalMcpToolApprovalFilterAlways]`

~~695~~

696 A filter object to specify which tools are allowed.

~~697~~

698 - `read_only: Optional[bool]`

~~699~~

700 Indicates whether or not a tool modifies data or is read-only. If an

701 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

702 it will match this filter.

~~703~~

704 - `tool_names: Optional[List[str]]`

~~705~~

706 List of allowed tool names.

~~707~~

708 - `never: Optional[McpRequireApprovalMcpToolApprovalFilterNever]`

~~709~~

710 A filter object to specify which tools are allowed.

~~711~~

712 - `read_only: Optional[bool]`

~~713~~

714 Indicates whether or not a tool modifies data or is read-only. If an

715 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

716 it will match this filter.

~~717~~

718 - `tool_names: Optional[List[str]]`

~~719~~

720 List of allowed tool names.

~~721~~

722 - `Literal["always", "never"]`

~~723~~

724 Specify a single approval policy for all tools. One of `always` or

725 `never`. When set to `always`, all tools will require approval. When

726 set to `never`, all tools will not require approval.

~~727~~

728 - `"always"`

~~729~~

730 - `"never"`

~~731~~

732 - `server_description: Optional[str]`

~~733~~

734 Optional description of the MCP server, used to provide more context.

~~735~~

736 - `server_url: Optional[str]`

~~737~~

738 The URL for the MCP server. One of `server_url` or `connector_id` must be

739 provided.

~~740~~

741 - `tracing: Optional[RealtimeTracingConfig]`

~~742~~

743 Realtime API can write session traces to the [Traces Dashboard](https://platform.openai.com/logs?api=traces). Set to null to disable tracing. Once

744 tracing is enabled for a session, the configuration cannot be modified.

~~745~~

746 `auto` will create a trace for the session with default values for the

747 workflow name, group id, and metadata.

~~748~~

749 - `Literal["auto"]`

~~750~~

751 Enables tracing and sets default values for tracing configuration options. Always `auto`.

~~752~~

753 - `"auto"`

~~754~~

755 - `class TracingConfiguration: …`

~~756~~

757 Granular configuration for tracing.

~~758~~

759 - `group_id: Optional[str]`

~~760~~

761 The group id to attach to this trace to enable filtering and

762 grouping in the Traces Dashboard.

~~763~~

764 - `metadata: Optional[object]`

~~765~~

766 The arbitrary metadata to attach to this trace to enable

767 filtering in the Traces Dashboard.

~~768~~

769 - `workflow_name: Optional[str]`

~~770~~

771 The name of the workflow to attach to this trace. This is used to

772 name the trace in the Traces Dashboard.

~~773~~

774 - `truncation: Optional[RealtimeTruncation]`

~~775~~

776 When the number of tokens in a conversation exceeds the model's input token limit, the conversation be truncated, meaning messages (starting from the oldest) will not be included in the model's context. A 32k context model with 4,096 max output tokens can only include 28,224 tokens in the context before truncation occurs.

~~777~~

778 Clients can configure truncation behavior to truncate with a lower max token limit, which is an effective way to control token usage and cost.

~~779~~

780 Truncation will reduce the number of cached tokens on the next turn (busting the cache), since messages are dropped from the beginning of the context. However, clients can also configure truncation to retain messages up to a fraction of the maximum context size, which will reduce the need for future truncations and thus improve the cache rate.

~~781~~

782 Truncation can be disabled entirely, which means the server will never truncate but would instead return an error if the conversation exceeds the model's input token limit.

~~783~~

784 - `Literal["auto", "disabled"]`

~~785~~

786 The truncation strategy to use for the session. `auto` is the default truncation strategy. `disabled` will disable truncation and emit errors when the conversation exceeds the input token limit.

~~787~~

788 - `"auto"`

~~789~~

790 - `"disabled"`

~~791~~

792 - `class RealtimeTruncationRetentionRatio: …`

~~793~~

794 Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.

~~795~~

796 - `retention_ratio: float`

~~797~~

798 Fraction of post-instruction conversation tokens to retain (`0.0` - `1.0`) when the conversation exceeds the input token limit. Setting this to `0.8` means that messages will be dropped until 80% of the maximum allowed tokens are used. This helps reduce the frequency of truncations and improve cache rates.

~~799~~

800 - `type: Literal["retention_ratio"]`

~~801~~

802 Use retention ratio truncation.

~~803~~

804 - `"retention_ratio"`

~~805~~

806 - `token_limits: Optional[TokenLimits]`

~~807~~

808 Optional custom token limits for this truncation strategy. If not provided, the model's default token limits will be used.

~~809~~

810 - `post_instructions: Optional[int]`

~~811~~

812 Maximum tokens allowed in the conversation after instructions (which including tool definitions). For example, setting this to 5,000 would mean that truncation would occur when the conversation exceeds 5,000 tokens after instructions. This cannot be higher than the model's context window size minus the maximum output tokens.

~~813~~

814 - `class RealtimeTranscriptionSessionCreateRequest: …`

~~815~~

816 Realtime transcription session object configuration.

~~817~~

818 - `type: Literal["transcription"]`

~~819~~

820 The type of session to create. Always `transcription` for transcription sessions.

~~821~~

822 - `"transcription"`

~~823~~

824 - `audio: Optional[RealtimeTranscriptionSessionAudio]`

~~825~~

826 Configuration for input and output audio.

~~827~~

828 - `input: Optional[RealtimeTranscriptionSessionAudioInput]`

~~829~~

830 - `format: Optional[RealtimeAudioFormats]`

~~831~~

832 The PCM audio format. Only a 24kHz sample rate is supported.

~~833~~

834 - `noise_reduction: Optional[NoiseReduction]`

~~835~~

836 Configuration for input audio noise reduction. This can be set to `null` to turn off.

837 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.

838 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.

~~839~~

840 - `type: Optional[NoiseReductionType]`

~~841~~

842 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.

~~843~~

844 - `transcription: Optional[AudioTranscription]`

~~845~~

846 Configuration for input audio transcription, defaults to off and can be set to `null` to turn off once on. Input audio transcription is not native to the model, since the model consumes audio directly. Transcription runs asynchronously through [the /audio/transcriptions endpoint](https://platform.openai.com/docs/api-reference/audio/createTranscription) and should be treated as guidance of input audio content rather than precisely what the model heard. The client can optionally set the language and prompt for transcription, these offer additional guidance to the transcription service.

~~847~~

848 - `turn_detection: Optional[RealtimeTranscriptionSessionAudioInputTurnDetection]`

~~849~~

850 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.

~~851~~

852 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.

~~853~~

854 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.

~~855~~

856 For `gpt-realtime-whisper` transcription sessions, turn detection must be

857 set to `null`; VAD is not supported.

~~858~~

859 - `class ServerVad: …`

~~860~~

861 Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.

~~862~~

863 - `type: Literal["server_vad"]`

~~864~~

865 Type of turn detection, `server_vad` to turn on simple Server VAD.

~~866~~

867 - `"server_vad"`

~~868~~

869 - `create_response: Optional[bool]`

~~870~~

871 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.

~~872~~

873 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~874~~

875 - `idle_timeout_ms: Optional[int]`

~~876~~

877 Optional timeout after which a model response will be triggered automatically. This is

878 useful for situations in which a long pause from the user is unexpected, such as a phone

879 call. The model will effectively prompt the user to continue the conversation based

880 on the current context.

~~881~~

882 The timeout value will be applied after the last model response's audio has finished playing,

883 i.e. it's set to the `response.done` time plus audio playback duration.

~~884~~

885 An `input_audio_buffer.timeout_triggered` event (plus events

886 associated with the Response) will be emitted when the timeout is reached.

887 Idle timeout is currently only supported for `server_vad` mode.

~~888~~

889 - `interrupt_response: Optional[bool]`

~~890~~

891 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default

892 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.

~~893~~

894 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~895~~

896 - `prefix_padding_ms: Optional[int]`

~~897~~

898 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in

899 milliseconds). Defaults to 300ms.

~~900~~

901 - `silence_duration_ms: Optional[int]`

~~902~~

903 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults

904 to 500ms. With shorter values the model will respond more quickly,

905 but may jump in on short pauses from the user.

~~906~~

907 - `threshold: Optional[float]`

~~908~~

909 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

910 higher threshold will require louder audio to activate the model, and

911 thus might perform better in noisy environments.

~~912~~

913 - `class SemanticVad: …`

~~914~~

915 Server-side semantic turn detection which uses a model to determine when the user has finished speaking.

~~916~~

917 - `type: Literal["semantic_vad"]`

~~918~~

919 Type of turn detection, `semantic_vad` to turn on Semantic VAD.

~~920~~

921 - `"semantic_vad"`

~~922~~

923 - `create_response: Optional[bool]`

~~924~~

925 Whether or not to automatically generate a response when a VAD stop event occurs.

~~926~~

927 - `eagerness: Optional[Literal["low", "medium", "high", "auto"]]`

~~928~~

929 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.

~~930~~

931 - `"low"`

~~932~~

933 - `"medium"`

~~934~~

935 - `"high"`

~~936~~

937 - `"auto"`

~~938~~

939 - `interrupt_response: Optional[bool]`

~~940~~

941 Whether or not to automatically interrupt any ongoing response with output to the default

942 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.

~~943~~

944 - `include: Optional[List[Literal["item.input_audio_transcription.logprobs"]]]`

~~945~~

946 Additional fields to include in server outputs.

~~947~~

948 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~949~~

950 - `"item.input_audio_transcription.logprobs"`

~~951~~

952### Returns

~~953~~

954- `class ClientSecretCreateResponse: …`

~~955~~

956 Response from creating a session and client secret for the Realtime API.

~~957~~

958 - `expires_at: int`

~~959~~

960 Expiration timestamp for the client secret, in seconds since epoch.

~~961~~

962 - `session: Session`

~~963~~

964 The session configuration for either a realtime or transcription session.

~~965~~

966 - `class RealtimeSessionCreateResponse: …`

~~967~~

968 A Realtime session configuration object.

~~969~~

970 - `id: str`

~~971~~

972 Unique identifier for the session that looks like `sess_1234567890abcdef`.

~~973~~

974 - `object: Literal["realtime.session"]`

~~975~~

976 The object type. Always `realtime.session`.

~~977~~

978 - `"realtime.session"`

~~979~~

980 - `type: Literal["realtime"]`

~~981~~

982 The type of session to create. Always `realtime` for the Realtime API.

~~983~~

984 - `"realtime"`

~~985~~

986 - `audio: Optional[Audio]`

~~987~~

988 Configuration for input and output audio.

~~989~~

990 - `input: Optional[AudioInput]`

~~991~~

992 - `format: Optional[RealtimeAudioFormats]`

~~993~~

994 The format of the input audio.

~~995~~

996 - `class AudioPCM: …`

~~997~~

998 The PCM audio format. Only a 24kHz sample rate is supported.

~~999~~

1000 - `rate: Optional[Literal[24000]]`

~~1001~~

1002 The sample rate of the audio. Always `24000`.

~~1003~~

1004 - `24000`

~~1005~~

1006 - `type: Optional[Literal["audio/pcm"]]`

~~1007~~

1008 The audio format. Always `audio/pcm`.

~~1009~~

1010 - `"audio/pcm"`

~~1011~~

1012 - `class AudioPCMU: …`

~~1013~~

1014 The G.711 μ-law format.

~~1015~~

1016 - `type: Optional[Literal["audio/pcmu"]]`

~~1017~~

1018 The audio format. Always `audio/pcmu`.

~~1019~~

1020 - `"audio/pcmu"`

~~1021~~

1022 - `class AudioPCMA: …`

~~1023~~

1024 The G.711 A-law format.

~~1025~~

1026 - `type: Optional[Literal["audio/pcma"]]`

~~1027~~

1028 The audio format. Always `audio/pcma`.

~~1029~~

1030 - `"audio/pcma"`

~~1031~~

1032 - `noise_reduction: Optional[AudioInputNoiseReduction]`

~~1033~~

1034 Configuration for input audio noise reduction. This can be set to `null` to turn off.

1035 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.

1036 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.

~~1037~~

1038 - `type: Optional[NoiseReductionType]`

~~1039~~

1040 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.

~~1041~~

1042 - `"near_field"`

~~1043~~

1044 - `"far_field"`

~~1045~~

1046 - `transcription: Optional[AudioTranscription]`

~~1047~~

1048 - `delay: Optional[Literal["minimal", "low", "medium", 2 more]]`

~~1049~~

1050 Controls how long the model waits before emitting transcription text.

1051 Higher values can improve transcription accuracy at the cost of latency.

1052 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~1053~~

1054 - `"minimal"`

~~1055~~

1056 - `"low"`

~~1057~~

1058 - `"medium"`

~~1059~~

1060 - `"high"`

~~1061~~

1062 - `"xhigh"`

~~1063~~

1064 - `language: Optional[str]`

~~1065~~

1066 The language of the input audio. Supplying the input language in

1067 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format

1068 will improve accuracy and latency.

~~1069~~

1070 - `model: Optional[Union[str, Literal["whisper-1", "gpt-4o-mini-transcribe", "gpt-4o-mini-transcribe-2025-12-15", 3 more], null]]`

~~1071~~

1072 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~1073~~

1074 - `str`

~~1075~~

1076 - `Literal["whisper-1", "gpt-4o-mini-transcribe", "gpt-4o-mini-transcribe-2025-12-15", 3 more]`

~~1077~~

1078 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~1079~~

1080 - `"whisper-1"`

~~1081~~

1082 - `"gpt-4o-mini-transcribe"`

~~1083~~

1084 - `"gpt-4o-mini-transcribe-2025-12-15"`

~~1085~~

1086 - `"gpt-4o-transcribe"`

~~1087~~

1088 - `"gpt-4o-transcribe-diarize"`

~~1089~~

1090 - `"gpt-realtime-whisper"`

~~1091~~

1092 - `prompt: Optional[str]`

~~1093~~

1094 An optional text to guide the model's style or continue a previous audio

1095 segment.

1096 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).

1097 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".

1098 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~1099~~

1100 - `turn_detection: Optional[AudioInputTurnDetection]`

~~1101~~

1102 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.

~~1103~~

1104 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.

~~1105~~

1106 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.

~~1107~~

1108 For `gpt-realtime-whisper` transcription sessions, turn detection must be

1109 set to `null`; VAD is not supported.

~~1110~~

1111 - `class AudioInputTurnDetectionServerVad: …`

~~1112~~

1113 Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.

~~1114~~

1115 - `type: Literal["server_vad"]`

~~1116~~

1117 Type of turn detection, `server_vad` to turn on simple Server VAD.

~~1118~~

1119 - `"server_vad"`

~~1120~~

1121 - `create_response: Optional[bool]`

~~1122~~

1123 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.

~~1124~~

1125 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~1126~~

1127 - `idle_timeout_ms: Optional[int]`

~~1128~~

1129 Optional timeout after which a model response will be triggered automatically. This is

1130 useful for situations in which a long pause from the user is unexpected, such as a phone

1131 call. The model will effectively prompt the user to continue the conversation based

1132 on the current context.

~~1133~~

1134 The timeout value will be applied after the last model response's audio has finished playing,

1135 i.e. it's set to the `response.done` time plus audio playback duration.

~~1136~~

1137 An `input_audio_buffer.timeout_triggered` event (plus events

1138 associated with the Response) will be emitted when the timeout is reached.

1139 Idle timeout is currently only supported for `server_vad` mode.

~~1140~~

1141 - `interrupt_response: Optional[bool]`

~~1142~~

1143 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default

1144 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.

~~1145~~

1146 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~1147~~

1148 - `prefix_padding_ms: Optional[int]`

~~1149~~

1150 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in

1151 milliseconds). Defaults to 300ms.

~~1152~~

1153 - `silence_duration_ms: Optional[int]`

~~1154~~

1155 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults

1156 to 500ms. With shorter values the model will respond more quickly,

1157 but may jump in on short pauses from the user.

~~1158~~

1159 - `threshold: Optional[float]`

~~1160~~

1161 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

1162 higher threshold will require louder audio to activate the model, and

1163 thus might perform better in noisy environments.

~~1164~~

1165 - `class AudioInputTurnDetectionSemanticVad: …`

~~1166~~

1167 Server-side semantic turn detection which uses a model to determine when the user has finished speaking.

~~1168~~

1169 - `type: Literal["semantic_vad"]`

~~1170~~

1171 Type of turn detection, `semantic_vad` to turn on Semantic VAD.

~~1172~~

1173 - `"semantic_vad"`

~~1174~~

1175 - `create_response: Optional[bool]`

~~1176~~

1177 Whether or not to automatically generate a response when a VAD stop event occurs.

~~1178~~

1179 - `eagerness: Optional[Literal["low", "medium", "high", "auto"]]`

~~1180~~

1181 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.

~~1182~~

1183 - `"low"`

~~1184~~

1185 - `"medium"`

~~1186~~

1187 - `"high"`

~~1188~~

1189 - `"auto"`

~~1190~~

1191 - `interrupt_response: Optional[bool]`

~~1192~~

1193 Whether or not to automatically interrupt any ongoing response with output to the default

1194 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.

~~1195~~

1196 - `output: Optional[AudioOutput]`

~~1197~~

1198 - `format: Optional[RealtimeAudioFormats]`

~~1199~~

1200 The format of the output audio.

~~1201~~

1202 - `speed: Optional[float]`

~~1203~~

1204 The speed of the model's spoken response as a multiple of the original speed.

1205 1.0 is the default speed. 0.25 is the minimum speed. 1.5 is the maximum speed. This value can only be changed in between model turns, not while a response is in progress.

~~1206~~

1207 This parameter is a post-processing adjustment to the audio after it is generated, it's

1208 also possible to prompt the model to speak faster or slower.

~~1209~~

1210 - `voice: Optional[Union[str, Literal["alloy", "ash", "ballad", 7 more], null]]`

~~1211~~

1212 The voice the model uses to respond. Voice cannot be changed during the

1213 session once the model has responded with audio at least once. Current

1214 voice options are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`,

1215 `shimmer`, `verse`, `marin`, and `cedar`. We recommend `marin` and `cedar` for

1216 best quality.

~~1217~~

1218 - `str`

~~1219~~

1220 - `Literal["alloy", "ash", "ballad", 7 more]`

~~1221~~

1222 The voice the model uses to respond. Voice cannot be changed during the

1223 session once the model has responded with audio at least once. Current

1224 voice options are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`,

1225 `shimmer`, `verse`, `marin`, and `cedar`. We recommend `marin` and `cedar` for

1226 best quality.

~~1227~~

1228 - `"alloy"`

~~1229~~

1230 - `"ash"`

~~1231~~

1232 - `"ballad"`

~~1233~~

1234 - `"coral"`

~~1235~~

1236 - `"echo"`

~~1237~~

1238 - `"sage"`

~~1239~~

1240 - `"shimmer"`

~~1241~~

1242 - `"verse"`

~~1243~~

1244 - `"marin"`

~~1245~~

1246 - `"cedar"`

~~1247~~

1248 - `expires_at: Optional[int]`

~~1249~~

1250 Expiration timestamp for the session, in seconds since epoch.

~~1251~~

1252 - `include: Optional[List[Literal["item.input_audio_transcription.logprobs"]]]`

~~1253~~

1254 Additional fields to include in server outputs.

~~1255~~

1256 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~1257~~

1258 - `"item.input_audio_transcription.logprobs"`

~~1259~~

1260 - `instructions: Optional[str]`

~~1261~~

1262 The default system instructions (i.e. system message) prepended to model calls. This field allows the client to guide the model on desired responses. The model can be instructed on response content and format, (e.g. "be extremely succinct", "act friendly", "here are examples of good responses") and on audio behavior (e.g. "talk quickly", "inject emotion into your voice", "laugh frequently"). The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behavior.

~~1263~~

1264 Note that the server sets default instructions which will be used if this field is not set and are visible in the `session.created` event at the start of the session.

~~1265~~

1266 - `max_output_tokens: Optional[Union[int, Literal["inf"], null]]`

~~1267~~

1268 Maximum number of output tokens for a single assistant response,

1269 inclusive of tool calls. Provide an integer between 1 and 4096 to

1270 limit output tokens, or `inf` for the maximum available tokens for a

1271 given model. Defaults to `inf`.

~~1272~~

1273 - `int`

~~1274~~

1275 - `Literal["inf"]`

~~1276~~

1277 - `"inf"`

~~1278~~

1279 - `model: Optional[Union[str, Literal["gpt-realtime", "gpt-realtime-1.5", "gpt-realtime-2", 14 more], null]]`

~~1280~~

1281 The Realtime model used for this session.

~~1282~~

1283 - `str`

~~1284~~

1285 - `Literal["gpt-realtime", "gpt-realtime-1.5", "gpt-realtime-2", 14 more]`

~~1286~~

1287 The Realtime model used for this session.

~~1288~~

1289 - `"gpt-realtime"`

~~1290~~

1291 - `"gpt-realtime-1.5"`

~~1292~~

1293 - `"gpt-realtime-2"`

~~1294~~

1295 - `"gpt-realtime-2025-08-28"`

~~1296~~

1297 - `"gpt-4o-realtime-preview"`

~~1298~~

1299 - `"gpt-4o-realtime-preview-2024-10-01"`

~~1300~~

1301 - `"gpt-4o-realtime-preview-2024-12-17"`

~~1302~~

1303 - `"gpt-4o-realtime-preview-2025-06-03"`

~~1304~~

1305 - `"gpt-4o-mini-realtime-preview"`

~~1306~~

1307 - `"gpt-4o-mini-realtime-preview-2024-12-17"`

~~1308~~

1309 - `"gpt-realtime-mini"`

~~1310~~

1311 - `"gpt-realtime-mini-2025-10-06"`

~~1312~~

1313 - `"gpt-realtime-mini-2025-12-15"`

~~1314~~

1315 - `"gpt-audio-1.5"`

~~1316~~

1317 - `"gpt-audio-mini"`

~~1318~~

1319 - `"gpt-audio-mini-2025-10-06"`

~~1320~~

1321 - `"gpt-audio-mini-2025-12-15"`

~~1322~~

1323 - `output_modalities: Optional[List[Literal["text", "audio"]]]`

~~1324~~

1325 The set of modalities the model can respond with. It defaults to `["audio"]`, indicating

1326 that the model will respond with audio plus a transcript. `["text"]` can be used to make

1327 the model respond with text only. It is not possible to request both `text` and `audio` at the same time.

~~1328~~

1329 - `"text"`

~~1330~~

1331 - `"audio"`

~~1332~~

1333 - `prompt: Optional[ResponsePrompt]`

~~1334~~

1335 Reference to a prompt template and its variables.

1336 [Learn more](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts).

~~1337~~

1338 - `id: str`

~~1339~~

1340 The unique identifier of the prompt template to use.

~~1341~~

1342 - `variables: Optional[Dict[str, Variables]]`

~~1343~~

1344 Optional map of values to substitute in for variables in your

1345 prompt. The substitution values can either be strings, or other

1346 Response input types like images or files.

~~1347~~

1348 - `str`

~~1349~~

1350 - `class ResponseInputText: …`

~~1351~~

1352 A text input to the model.

~~1353~~

1354 - `text: str`

~~1355~~

1356 The text input to the model.

~~1357~~

1358 - `type: Literal["input_text"]`

~~1359~~

1360 The type of the input item. Always `input_text`.

~~1361~~

1362 - `"input_text"`

~~1363~~

1364 - `class ResponseInputImage: …`

~~1365~~

1366 An image input to the model. Learn about [image inputs](https://platform.openai.com/docs/guides/vision).

~~1367~~

1368 - `detail: Literal["low", "high", "auto", "original"]`

~~1369~~

1370 The detail level of the image to be sent to the model. One of `high`, `low`, `auto`, or `original`. Defaults to `auto`.

~~1371~~

1372 - `"low"`

~~1373~~

1374 - `"high"`

~~1375~~

1376 - `"auto"`

~~1377~~

1378 - `"original"`

~~1379~~

1380 - `type: Literal["input_image"]`

~~1381~~

1382 The type of the input item. Always `input_image`.

~~1383~~

1384 - `"input_image"`

~~1385~~

1386 - `file_id: Optional[str]`

~~1387~~

1388 The ID of the file to be sent to the model.

~~1389~~

1390 - `image_url: Optional[str]`

~~1391~~

1392 The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.

~~1393~~

1394 - `class ResponseInputFile: …`

~~1395~~

1396 A file input to the model.

~~1397~~

1398 - `type: Literal["input_file"]`

~~1399~~

1400 The type of the input item. Always `input_file`.

~~1401~~

1402 - `"input_file"`

~~1403~~

1404 - `detail: Optional[Literal["low", "high"]]`

~~1405~~

1406 The detail level of the file to be sent to the model. Use `low` for the default rendering behavior, or `high` to render the file at higher quality. Defaults to `low`.

~~1407~~

1408 - `"low"`

~~1409~~

1410 - `"high"`

~~1411~~

1412 - `file_data: Optional[str]`

~~1413~~

1414 The content of the file to be sent to the model.

~~1415~~

1416 - `file_id: Optional[str]`

~~1417~~

1418 The ID of the file to be sent to the model.

~~1419~~

1420 - `file_url: Optional[str]`

~~1421~~

1422 The URL of the file to be sent to the model.

~~1423~~

1424 - `filename: Optional[str]`

~~1425~~

1426 The name of the file to be sent to the model.

~~1427~~

1428 - `version: Optional[str]`

~~1429~~

1430 Optional version of the prompt template.

~~1431~~

1432 - `reasoning: Optional[RealtimeReasoning]`

~~1433~~

1434 Configuration for reasoning-capable Realtime models such as `gpt-realtime-2`.

~~1435~~

1436 - `effort: Optional[RealtimeReasoningEffort]`

~~1437~~

1438 Constrains effort on reasoning for reasoning-capable Realtime models such as

1439 `gpt-realtime-2`.

~~1440~~

1441 - `"minimal"`

~~1442~~

1443 - `"low"`

~~1444~~

1445 - `"medium"`

~~1446~~

1447 - `"high"`

~~1448~~

1449 - `"xhigh"`

~~1450~~

1451 - `tool_choice: Optional[ToolChoice]`

~~1452~~

1453 How the model chooses tools. Provide one of the string modes or force a specific

1454 function/MCP tool.

~~1455~~

1456 - `Literal["none", "auto", "required"]`

~~1457~~

1458 - `"none"`

~~1459~~

1460 - `"auto"`

~~1461~~

1462 - `"required"`

~~1463~~

1464 - `class ToolChoiceFunction: …`

~~1465~~

1466 Use this option to force the model to call a specific function.

~~1467~~

1468 - `name: str`

~~1469~~

1470 The name of the function to call.

~~1471~~

1472 - `type: Literal["function"]`

~~1473~~

1474 For function calling, the type is always `function`.

~~1475~~

1476 - `"function"`

~~1477~~

1478 - `class ToolChoiceMcp: …`

~~1479~~

1480 Use this option to force the model to call a specific tool on a remote MCP server.

~~1481~~

1482 - `server_label: str`

~~1483~~

1484 The label of the MCP server to use.

~~1485~~

1486 - `type: Literal["mcp"]`

~~1487~~

1488 For MCP tools, the type is always `mcp`.

~~1489~~

1490 - `"mcp"`

~~1491~~

1492 - `name: Optional[str]`

~~1493~~

1494 The name of the tool to call on the server.

~~1495~~

1496 - `tools: Optional[List[Tool]]`

~~1497~~

1498 Tools available to the model.

~~1499~~

1500 - `class RealtimeFunctionTool: …`

~~1501~~

1502 - `description: Optional[str]`

~~1503~~

1504 The description of the function, including guidance on when and how

1505 to call it, and guidance about what to tell the user when calling

1506 (if anything).

~~1507~~

1508 - `name: Optional[str]`

~~1509~~

1510 The name of the function.

~~1511~~

1512 - `parameters: Optional[object]`

~~1513~~

1514 Parameters of the function in JSON Schema.

~~1515~~

1516 - `type: Optional[Literal["function"]]`

~~1517~~

1518 The type of the tool, i.e. `function`.

~~1519~~

1520 - `"function"`

~~1521~~

1522 - `class ToolMcpTool: …`

~~1523~~

1524 Give the model access to additional tools via remote Model Context Protocol

1525 (MCP) servers. [Learn more about MCP](https://platform.openai.com/docs/guides/tools-remote-mcp).

~~1526~~

1527 - `server_label: str`

~~1528~~

1529 A label for this MCP server, used to identify it in tool calls.

~~1530~~

1531 - `type: Literal["mcp"]`

~~1532~~

1533 The type of the MCP tool. Always `mcp`.

~~1534~~

1535 - `"mcp"`

~~1536~~

1537 - `allowed_tools: Optional[ToolMcpToolAllowedTools]`

~~1538~~

1539 List of allowed tool names or a filter object.

~~1540~~

1541 - `List[str]`

~~1542~~

1543 A string array of allowed tool names

~~1544~~

1545 - `class ToolMcpToolAllowedToolsMcpToolFilter: …`

~~1546~~

1547 A filter object to specify which tools are allowed.

~~1548~~

1549 - `read_only: Optional[bool]`

~~1550~~

1551 Indicates whether or not a tool modifies data or is read-only. If an

1552 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

1553 it will match this filter.

~~1554~~

1555 - `tool_names: Optional[List[str]]`

~~1556~~

1557 List of allowed tool names.

~~1558~~

1559 - `authorization: Optional[str]`

~~1560~~

1561 An OAuth access token that can be used with a remote MCP server, either

1562 with a custom MCP server URL or a service connector. Your application

1563 must handle the OAuth authorization flow and provide the token here.

~~1564~~

1565 - `connector_id: Optional[Literal["connector_dropbox", "connector_gmail", "connector_googlecalendar", 5 more]]`

~~1566~~

1567 Identifier for service connectors, like those available in ChatGPT. One of

1568 `server_url` or `connector_id` must be provided. Learn more about service

1569 connectors [here](https://platform.openai.com/docs/guides/tools-remote-mcp#connectors).

~~1570~~

1571 Currently supported `connector_id` values are:

~~1572~~

1573 - Dropbox: `connector_dropbox`

1574 - Gmail: `connector_gmail`

1575 - Google Calendar: `connector_googlecalendar`

1576 - Google Drive: `connector_googledrive`

1577 - Microsoft Teams: `connector_microsoftteams`

1578 - Outlook Calendar: `connector_outlookcalendar`

1579 - Outlook Email: `connector_outlookemail`

1580 - SharePoint: `connector_sharepoint`

~~1581~~

1582 - `"connector_dropbox"`

~~1583~~

1584 - `"connector_gmail"`

~~1585~~

1586 - `"connector_googlecalendar"`

~~1587~~

1588 - `"connector_googledrive"`

~~1589~~

1590 - `"connector_microsoftteams"`

~~1591~~

1592 - `"connector_outlookcalendar"`

~~1593~~

1594 - `"connector_outlookemail"`

~~1595~~

1596 - `"connector_sharepoint"`

~~1597~~

1598 - `defer_loading: Optional[bool]`

~~1599~~

1600 Whether this MCP tool is deferred and discovered via tool search.

~~1601~~

1602 - `headers: Optional[Dict[str, str]]`

~~1603~~

1604 Optional HTTP headers to send to the MCP server. Use for authentication

1605 or other purposes.

~~1606~~

1607 - `require_approval: Optional[ToolMcpToolRequireApproval]`

~~1608~~

1609 Specify which of the MCP server's tools require approval.

~~1610~~

1611 - `class ToolMcpToolRequireApprovalMcpToolApprovalFilter: …`

~~1612~~

1613 Specify which of the MCP server's tools require approval. Can be

1614 `always`, `never`, or a filter object associated with tools

1615 that require approval.

~~1616~~

1617 - `always: Optional[ToolMcpToolRequireApprovalMcpToolApprovalFilterAlways]`

~~1618~~

1619 A filter object to specify which tools are allowed.

~~1620~~

1621 - `read_only: Optional[bool]`

~~1622~~

1623 Indicates whether or not a tool modifies data or is read-only. If an

1624 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

1625 it will match this filter.

~~1626~~

1627 - `tool_names: Optional[List[str]]`

~~1628~~

1629 List of allowed tool names.

~~1630~~

1631 - `never: Optional[ToolMcpToolRequireApprovalMcpToolApprovalFilterNever]`

~~1632~~

1633 A filter object to specify which tools are allowed.

~~1634~~

1635 - `read_only: Optional[bool]`

~~1636~~

1637 Indicates whether or not a tool modifies data or is read-only. If an

1638 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

1639 it will match this filter.

~~1640~~

1641 - `tool_names: Optional[List[str]]`

~~1642~~

1643 List of allowed tool names.

~~1644~~

1645 - `Literal["always", "never"]`

~~1646~~

1647 Specify a single approval policy for all tools. One of `always` or

1648 `never`. When set to `always`, all tools will require approval. When

1649 set to `never`, all tools will not require approval.

~~1650~~

1651 - `"always"`

~~1652~~

1653 - `"never"`

~~1654~~

1655 - `server_description: Optional[str]`

~~1656~~

1657 Optional description of the MCP server, used to provide more context.

~~1658~~

1659 - `server_url: Optional[str]`

~~1660~~

1661 The URL for the MCP server. One of `server_url` or `connector_id` must be

1662 provided.

~~1663~~

1664 - `tracing: Optional[Tracing]`

~~1665~~

1666 Realtime API can write session traces to the [Traces Dashboard](https://platform.openai.com/logs?api=traces). Set to null to disable tracing. Once

1667 tracing is enabled for a session, the configuration cannot be modified.

~~1668~~

1669 `auto` will create a trace for the session with default values for the

1670 workflow name, group id, and metadata.

~~1671~~

1672 - `Literal["auto"]`

~~1673~~

1674 Enables tracing and sets default values for tracing configuration options. Always `auto`.

~~1675~~

1676 - `"auto"`

~~1677~~

1678 - `class TracingTracingConfiguration: …`

~~1679~~

1680 Granular configuration for tracing.

~~1681~~

1682 - `group_id: Optional[str]`

~~1683~~

1684 The group id to attach to this trace to enable filtering and

1685 grouping in the Traces Dashboard.

~~1686~~

1687 - `metadata: Optional[object]`

~~1688~~

1689 The arbitrary metadata to attach to this trace to enable

1690 filtering in the Traces Dashboard.

~~1691~~

1692 - `workflow_name: Optional[str]`

~~1693~~

1694 The name of the workflow to attach to this trace. This is used to

1695 name the trace in the Traces Dashboard.

~~1696~~

1697 - `truncation: Optional[RealtimeTruncation]`

~~1698~~

1699 When the number of tokens in a conversation exceeds the model's input token limit, the conversation be truncated, meaning messages (starting from the oldest) will not be included in the model's context. A 32k context model with 4,096 max output tokens can only include 28,224 tokens in the context before truncation occurs.

~~1700~~

1701 Clients can configure truncation behavior to truncate with a lower max token limit, which is an effective way to control token usage and cost.

~~1702~~

1703 Truncation will reduce the number of cached tokens on the next turn (busting the cache), since messages are dropped from the beginning of the context. However, clients can also configure truncation to retain messages up to a fraction of the maximum context size, which will reduce the need for future truncations and thus improve the cache rate.

~~1704~~

1705 Truncation can be disabled entirely, which means the server will never truncate but would instead return an error if the conversation exceeds the model's input token limit.

~~1706~~

1707 - `Literal["auto", "disabled"]`

~~1708~~

1709 The truncation strategy to use for the session. `auto` is the default truncation strategy. `disabled` will disable truncation and emit errors when the conversation exceeds the input token limit.

~~1710~~

1711 - `"auto"`

~~1712~~

1713 - `"disabled"`

~~1714~~

1715 - `class RealtimeTruncationRetentionRatio: …`

~~1716~~

1717 Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.

~~1718~~

1719 - `retention_ratio: float`

~~1720~~

1721 Fraction of post-instruction conversation tokens to retain (`0.0` - `1.0`) when the conversation exceeds the input token limit. Setting this to `0.8` means that messages will be dropped until 80% of the maximum allowed tokens are used. This helps reduce the frequency of truncations and improve cache rates.

~~1722~~

1723 - `type: Literal["retention_ratio"]`

~~1724~~

1725 Use retention ratio truncation.

~~1726~~

1727 - `"retention_ratio"`

~~1728~~

1729 - `token_limits: Optional[TokenLimits]`

~~1730~~

1731 Optional custom token limits for this truncation strategy. If not provided, the model's default token limits will be used.

~~1732~~

1733 - `post_instructions: Optional[int]`

~~1734~~

1735 Maximum tokens allowed in the conversation after instructions (which including tool definitions). For example, setting this to 5,000 would mean that truncation would occur when the conversation exceeds 5,000 tokens after instructions. This cannot be higher than the model's context window size minus the maximum output tokens.

~~1736~~

1737 - `class RealtimeTranscriptionSessionCreateResponse: …`

~~1738~~

1739 A Realtime transcription session configuration object.

~~1740~~

1741 - `id: str`

~~1742~~

1743 Unique identifier for the session that looks like `sess_1234567890abcdef`.

~~1744~~

1745 - `object: str`

~~1746~~

1747 The object type. Always `realtime.transcription_session`.

~~1748~~

1749 - `type: Literal["transcription"]`

~~1750~~

1751 The type of session. Always `transcription` for transcription sessions.

~~1752~~

1753 - `"transcription"`

~~1754~~

1755 - `audio: Optional[Audio]`

~~1756~~

1757 Configuration for input audio for the session.

~~1758~~

1759 - `input: Optional[AudioInput]`

~~1760~~

1761 - `format: Optional[RealtimeAudioFormats]`

~~1762~~

1763 The PCM audio format. Only a 24kHz sample rate is supported.

~~1764~~

1765 - `noise_reduction: Optional[AudioInputNoiseReduction]`

~~1766~~

1767 Configuration for input audio noise reduction.

~~1768~~

1769 - `type: Optional[NoiseReductionType]`

~~1770~~

1771 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.

~~1772~~

1773 - `transcription: Optional[AudioTranscription]`

~~1774~~

1775 - `turn_detection: Optional[RealtimeTranscriptionSessionTurnDetection]`

~~1776~~

1777 Configuration for turn detection. Can be set to `null` to turn off. Server

1778 VAD means that the model will detect the start and end of speech based on

1779 audio volume and respond at the end of user speech. For `gpt-realtime-whisper`, this must be `null`; VAD is not supported.

~~1780~~

1781 - `prefix_padding_ms: Optional[int]`

~~1782~~

1783 Amount of audio to include before the VAD detected speech (in

1784 milliseconds). Defaults to 300ms.

~~1785~~

1786 - `silence_duration_ms: Optional[int]`

~~1787~~

1788 Duration of silence to detect speech stop (in milliseconds). Defaults

1789 to 500ms. With shorter values the model will respond more quickly,

1790 but may jump in on short pauses from the user.

~~1791~~

1792 - `threshold: Optional[float]`

~~1793~~

1794 Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

1795 higher threshold will require louder audio to activate the model, and

1796 thus might perform better in noisy environments.

~~1797~~

1798 - `type: Optional[str]`

~~1799~~

1800 Type of turn detection, only `server_vad` is currently supported.

~~1801~~

1802 - `expires_at: Optional[int]`

~~1803~~

1804 Expiration timestamp for the session, in seconds since epoch.

~~1805~~

1806 - `include: Optional[List[Literal["item.input_audio_transcription.logprobs"]]]`

~~1807~~

1808 Additional fields to include in server outputs.

~~1809~~

1810 - `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~1811~~

1812 - `"item.input_audio_transcription.logprobs"`

~~1813~~

1814 - `value: str`

~~1815~~

1816 The generated client secret value.

~~1817~~

1818### Example

~~1819~~

1820```python

1821import os

1822from openai import OpenAI

~~1823~~

1824client = OpenAI(

1825 api_key=os.environ.get("OPENAI_API_KEY"), # This is the default and can be omitted

1826)

1827client_secret = client.realtime.client_secrets.create()

1828print(client_secret.expires_at)

1829```

~~1830~~

1831#### Response

~~1832~~

1833```json

1834{

1835 "expires_at": 0,

1836 "session": {

1837 "id": "id",

1838 "object": "realtime.session",

1839 "type": "realtime",

1840 "audio": {

1841 "input": {

1842 "format": {

1843 "rate": 24000,

1844 "type": "audio/pcm"

1845 },

1846 "noise_reduction": {

1847 "type": "near_field"

1848 },

1849 "transcription": {

1850 "delay": "minimal",

1851 "language": "language",

1852 "model": "string",

1853 "prompt": "prompt"

1854 },

1855 "turn_detection": {

1856 "type": "server_vad",

1857 "create_response": true,

1858 "idle_timeout_ms": 5000,

1859 "interrupt_response": true,

1860 "prefix_padding_ms": 0,

1861 "silence_duration_ms": 0,

1862 "threshold": 0

1863 }

1864 },

1865 "output": {

1866 "format": {

1867 "rate": 24000,

1868 "type": "audio/pcm"

1869 },

1870 "speed": 0.25,

1871 "voice": "ash"

1872 }

1873 },

1874 "expires_at": 0,

1875 "include": [

1876 "item.input_audio_transcription.logprobs"

1877 ],

1878 "instructions": "instructions",

1879 "max_output_tokens": 0,

1880 "model": "string",

1881 "output_modalities": [

1882 "text"

1883 ],

1884 "prompt": {

1885 "id": "id",

1886 "variables": {

1887 "foo": "string"

1888 },

1889 "version": "version"

1890 },

1891 "reasoning": {

1892 "effort": "minimal"

1893 },

1894 "tool_choice": "none",

1895 "tools": [

1896 {

1897 "description": "description",

1898 "name": "name",

1899 "parameters": {},

1900 "type": "function"

1901 }

1902 ],

1903 "tracing": "auto",

1904 "truncation": "auto"

1905 },

1906 "value": "value"

1907}

1908```

~~1909~~

1910## Domain Types

~~1911~~

1912### Realtime Session Create Response

~~1913~~

1914- `class RealtimeSessionCreateResponse: …`

~~1915~~

1916 A Realtime session configuration object.

~~1917~~

1918 - `id: str`

~~1919~~

1920 Unique identifier for the session that looks like `sess_1234567890abcdef`.

~~1921~~

1922 - `object: Literal["realtime.session"]`

~~1923~~

1924 The object type. Always `realtime.session`.

~~1925~~

1926 - `"realtime.session"`

~~1927~~

1928 - `type: Literal["realtime"]`

~~1929~~

1930 The type of session to create. Always `realtime` for the Realtime API.

~~1931~~

1932 - `"realtime"`

~~1933~~

1934 - `audio: Optional[Audio]`

~~1935~~

1936 Configuration for input and output audio.

~~1937~~

1938 - `input: Optional[AudioInput]`

~~1939~~

1940 - `format: Optional[RealtimeAudioFormats]`

~~1941~~

1942 The format of the input audio.

~~1943~~

1944 - `class AudioPCM: …`

~~1945~~

1946 The PCM audio format. Only a 24kHz sample rate is supported.

~~1947~~

1948 - `rate: Optional[Literal[24000]]`

~~1949~~

1950 The sample rate of the audio. Always `24000`.

~~1951~~

1952 - `24000`

~~1953~~

1954 - `type: Optional[Literal["audio/pcm"]]`

~~1955~~

1956 The audio format. Always `audio/pcm`.

~~1957~~

1958 - `"audio/pcm"`

~~1959~~

1960 - `class AudioPCMU: …`

~~1961~~

1962 The G.711 μ-law format.

~~1963~~

1964 - `type: Optional[Literal["audio/pcmu"]]`

~~1965~~

1966 The audio format. Always `audio/pcmu`.

~~1967~~

1968 - `"audio/pcmu"`

~~1969~~

1970 - `class AudioPCMA: …`

~~1971~~

1972 The G.711 A-law format.

~~1973~~

1974 - `type: Optional[Literal["audio/pcma"]]`

~~1975~~

1976 The audio format. Always `audio/pcma`.

~~1977~~

1978 - `"audio/pcma"`

~~1979~~

1980 - `noise_reduction: Optional[AudioInputNoiseReduction]`

~~1981~~

1982 Configuration for input audio noise reduction. This can be set to `null` to turn off.

1983 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.

1984 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.

~~1985~~

1986 - `type: Optional[NoiseReductionType]`

~~1987~~

1988 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.

~~1989~~

1990 - `"near_field"`

~~1991~~

1992 - `"far_field"`

~~1993~~

1994 - `transcription: Optional[AudioTranscription]`

~~1995~~

1996 - `delay: Optional[Literal["minimal", "low", "medium", 2 more]]`

~~1997~~

1998 Controls how long the model waits before emitting transcription text.

1999 Higher values can improve transcription accuracy at the cost of latency.

2000 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~2001~~

2002 - `"minimal"`

~~2003~~

2004 - `"low"`

~~2005~~

2006 - `"medium"`

~~2007~~

2008 - `"high"`

~~2009~~

2010 - `"xhigh"`

~~2011~~

2012 - `language: Optional[str]`

~~2013~~

2014 The language of the input audio. Supplying the input language in

2015 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format

2016 will improve accuracy and latency.

~~2017~~

2018 - `model: Optional[Union[str, Literal["whisper-1", "gpt-4o-mini-transcribe", "gpt-4o-mini-transcribe-2025-12-15", 3 more], null]]`

~~2019~~

2020 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~2021~~

2022 - `str`

~~2023~~

2024 - `Literal["whisper-1", "gpt-4o-mini-transcribe", "gpt-4o-mini-transcribe-2025-12-15", 3 more]`

~~2025~~

2026 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~2027~~

2028 - `"whisper-1"`

~~2029~~

2030 - `"gpt-4o-mini-transcribe"`

~~2031~~

2032 - `"gpt-4o-mini-transcribe-2025-12-15"`

~~2033~~

2034 - `"gpt-4o-transcribe"`

~~2035~~

2036 - `"gpt-4o-transcribe-diarize"`

~~2037~~

2038 - `"gpt-realtime-whisper"`

~~2039~~

2040 - `prompt: Optional[str]`

~~2041~~

2042 An optional text to guide the model's style or continue a previous audio

2043 segment.

2044 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).

2045 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".

2046 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~2047~~

2048 - `turn_detection: Optional[AudioInputTurnDetection]`

~~2049~~

2050 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.

~~2051~~

2052 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.

~~2053~~

2054 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.

~~2055~~

2056 For `gpt-realtime-whisper` transcription sessions, turn detection must be

2057 set to `null`; VAD is not supported.

~~2058~~

2059 - `class AudioInputTurnDetectionServerVad: …`

~~2060~~

2061 Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.

~~2062~~

2063 - `type: Literal["server_vad"]`

~~2064~~

2065 Type of turn detection, `server_vad` to turn on simple Server VAD.

~~2066~~

2067 - `"server_vad"`

~~2068~~

2069 - `create_response: Optional[bool]`

~~2070~~

2071 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.

~~2072~~

2073 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~2074~~

2075 - `idle_timeout_ms: Optional[int]`

~~2076~~

2077 Optional timeout after which a model response will be triggered automatically. This is

2078 useful for situations in which a long pause from the user is unexpected, such as a phone

2079 call. The model will effectively prompt the user to continue the conversation based

2080 on the current context.

~~2081~~

2082 The timeout value will be applied after the last model response's audio has finished playing,

2083 i.e. it's set to the `response.done` time plus audio playback duration.

~~2084~~

2085 An `input_audio_buffer.timeout_triggered` event (plus events

2086 associated with the Response) will be emitted when the timeout is reached.

2087 Idle timeout is currently only supported for `server_vad` mode.

~~2088~~

2089 - `interrupt_response: Optional[bool]`

~~2090~~

2091 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default

2092 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.

~~2093~~

2094 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~2095~~

2096 - `prefix_padding_ms: Optional[int]`

~~2097~~

2098 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in

2099 milliseconds). Defaults to 300ms.

~~2100~~

2101 - `silence_duration_ms: Optional[int]`

~~2102~~

2103 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults

2104 to 500ms. With shorter values the model will respond more quickly,

2105 but may jump in on short pauses from the user.

~~2106~~

2107 - `threshold: Optional[float]`

~~2108~~

2109 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

2110 higher threshold will require louder audio to activate the model, and

2111 thus might perform better in noisy environments.

~~2112~~

2113 - `class AudioInputTurnDetectionSemanticVad: …`

~~2114~~

2115 Server-side semantic turn detection which uses a model to determine when the user has finished speaking.

~~2116~~

2117 - `type: Literal["semantic_vad"]`

~~2118~~

2119 Type of turn detection, `semantic_vad` to turn on Semantic VAD.

~~2120~~

2121 - `"semantic_vad"`

~~2122~~

2123 - `create_response: Optional[bool]`

~~2124~~

2125 Whether or not to automatically generate a response when a VAD stop event occurs.

~~2126~~

2127 - `eagerness: Optional[Literal["low", "medium", "high", "auto"]]`

~~2128~~

2129 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.

~~2130~~

2131 - `"low"`

~~2132~~

2133 - `"medium"`

~~2134~~

2135 - `"high"`

~~2136~~

2137 - `"auto"`

~~2138~~

2139 - `interrupt_response: Optional[bool]`

~~2140~~

2141 Whether or not to automatically interrupt any ongoing response with output to the default

2142 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.

~~2143~~

2144 - `output: Optional[AudioOutput]`

~~2145~~

2146 - `format: Optional[RealtimeAudioFormats]`

~~2147~~

2148 The format of the output audio.

~~2149~~

2150 - `speed: Optional[float]`

~~2151~~

2152 The speed of the model's spoken response as a multiple of the original speed.

2153 1.0 is the default speed. 0.25 is the minimum speed. 1.5 is the maximum speed. This value can only be changed in between model turns, not while a response is in progress.

~~2154~~

2155 This parameter is a post-processing adjustment to the audio after it is generated, it's

2156 also possible to prompt the model to speak faster or slower.

~~2157~~

2158 - `voice: Optional[Union[str, Literal["alloy", "ash", "ballad", 7 more], null]]`

~~2159~~

2160 The voice the model uses to respond. Voice cannot be changed during the

2161 session once the model has responded with audio at least once. Current

2162 voice options are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`,

2163 `shimmer`, `verse`, `marin`, and `cedar`. We recommend `marin` and `cedar` for

2164 best quality.

~~2165~~

2166 - `str`

~~2167~~

2168 - `Literal["alloy", "ash", "ballad", 7 more]`

~~2169~~

2170 The voice the model uses to respond. Voice cannot be changed during the

2171 session once the model has responded with audio at least once. Current

2172 voice options are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`,

2173 `shimmer`, `verse`, `marin`, and `cedar`. We recommend `marin` and `cedar` for

2174 best quality.

~~2175~~

2176 - `"alloy"`

~~2177~~

2178 - `"ash"`

~~2179~~

2180 - `"ballad"`

~~2181~~

2182 - `"coral"`

~~2183~~

2184 - `"echo"`

~~2185~~

2186 - `"sage"`

~~2187~~

2188 - `"shimmer"`

~~2189~~

2190 - `"verse"`

~~2191~~

2192 - `"marin"`

~~2193~~

2194 - `"cedar"`

~~2195~~

2196 - `expires_at: Optional[int]`

~~2197~~

2198 Expiration timestamp for the session, in seconds since epoch.

~~2199~~

2200 - `include: Optional[List[Literal["item.input_audio_transcription.logprobs"]]]`

~~2201~~

2202 Additional fields to include in server outputs.

~~2203~~

2204 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~2205~~

2206 - `"item.input_audio_transcription.logprobs"`

~~2207~~

2208 - `instructions: Optional[str]`

~~2209~~

2210 The default system instructions (i.e. system message) prepended to model calls. This field allows the client to guide the model on desired responses. The model can be instructed on response content and format, (e.g. "be extremely succinct", "act friendly", "here are examples of good responses") and on audio behavior (e.g. "talk quickly", "inject emotion into your voice", "laugh frequently"). The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behavior.

~~2211~~

2212 Note that the server sets default instructions which will be used if this field is not set and are visible in the `session.created` event at the start of the session.

~~2213~~

2214 - `max_output_tokens: Optional[Union[int, Literal["inf"], null]]`

~~2215~~

2216 Maximum number of output tokens for a single assistant response,

2217 inclusive of tool calls. Provide an integer between 1 and 4096 to

2218 limit output tokens, or `inf` for the maximum available tokens for a

2219 given model. Defaults to `inf`.

~~2220~~

2221 - `int`

~~2222~~

2223 - `Literal["inf"]`

~~2224~~

2225 - `"inf"`

~~2226~~

2227 - `model: Optional[Union[str, Literal["gpt-realtime", "gpt-realtime-1.5", "gpt-realtime-2", 14 more], null]]`

~~2228~~

2229 The Realtime model used for this session.

~~2230~~

2231 - `str`

~~2232~~

2233 - `Literal["gpt-realtime", "gpt-realtime-1.5", "gpt-realtime-2", 14 more]`

~~2234~~

2235 The Realtime model used for this session.

~~2236~~

2237 - `"gpt-realtime"`

~~2238~~

2239 - `"gpt-realtime-1.5"`

~~2240~~

2241 - `"gpt-realtime-2"`

~~2242~~

2243 - `"gpt-realtime-2025-08-28"`

~~2244~~

2245 - `"gpt-4o-realtime-preview"`

~~2246~~

2247 - `"gpt-4o-realtime-preview-2024-10-01"`

~~2248~~

2249 - `"gpt-4o-realtime-preview-2024-12-17"`

~~2250~~

2251 - `"gpt-4o-realtime-preview-2025-06-03"`

~~2252~~

2253 - `"gpt-4o-mini-realtime-preview"`

~~2254~~

2255 - `"gpt-4o-mini-realtime-preview-2024-12-17"`

~~2256~~

2257 - `"gpt-realtime-mini"`

~~2258~~

2259 - `"gpt-realtime-mini-2025-10-06"`

~~2260~~

2261 - `"gpt-realtime-mini-2025-12-15"`

~~2262~~

2263 - `"gpt-audio-1.5"`

~~2264~~

2265 - `"gpt-audio-mini"`

~~2266~~

2267 - `"gpt-audio-mini-2025-10-06"`

~~2268~~

2269 - `"gpt-audio-mini-2025-12-15"`

~~2270~~

2271 - `output_modalities: Optional[List[Literal["text", "audio"]]]`

~~2272~~

2273 The set of modalities the model can respond with. It defaults to `["audio"]`, indicating

2274 that the model will respond with audio plus a transcript. `["text"]` can be used to make

2275 the model respond with text only. It is not possible to request both `text` and `audio` at the same time.

~~2276~~

2277 - `"text"`

~~2278~~

2279 - `"audio"`

~~2280~~

2281 - `prompt: Optional[ResponsePrompt]`

~~2282~~

2283 Reference to a prompt template and its variables.

2284 [Learn more](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts).

~~2285~~

2286 - `id: str`

~~2287~~

2288 The unique identifier of the prompt template to use.

~~2289~~

2290 - `variables: Optional[Dict[str, Variables]]`

~~2291~~

2292 Optional map of values to substitute in for variables in your

2293 prompt. The substitution values can either be strings, or other

2294 Response input types like images or files.

~~2295~~

2296 - `str`

~~2297~~

2298 - `class ResponseInputText: …`

~~2299~~

2300 A text input to the model.

~~2301~~

2302 - `text: str`

~~2303~~

2304 The text input to the model.

~~2305~~

2306 - `type: Literal["input_text"]`

~~2307~~

2308 The type of the input item. Always `input_text`.

~~2309~~

2310 - `"input_text"`

~~2311~~

2312 - `class ResponseInputImage: …`

~~2313~~

2314 An image input to the model. Learn about [image inputs](https://platform.openai.com/docs/guides/vision).

~~2315~~

2316 - `detail: Literal["low", "high", "auto", "original"]`

~~2317~~

2318 The detail level of the image to be sent to the model. One of `high`, `low`, `auto`, or `original`. Defaults to `auto`.

~~2319~~

2320 - `"low"`

~~2321~~

2322 - `"high"`

~~2323~~

2324 - `"auto"`

~~2325~~

2326 - `"original"`

~~2327~~

2328 - `type: Literal["input_image"]`

~~2329~~

2330 The type of the input item. Always `input_image`.

~~2331~~

2332 - `"input_image"`

~~2333~~

2334 - `file_id: Optional[str]`

~~2335~~

2336 The ID of the file to be sent to the model.

~~2337~~

2338 - `image_url: Optional[str]`

~~2339~~

2340 The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.

~~2341~~

2342 - `class ResponseInputFile: …`

~~2343~~

2344 A file input to the model.

~~2345~~

2346 - `type: Literal["input_file"]`

~~2347~~

2348 The type of the input item. Always `input_file`.

~~2349~~

2350 - `"input_file"`

~~2351~~

2352 - `detail: Optional[Literal["low", "high"]]`

~~2353~~

2354 The detail level of the file to be sent to the model. Use `low` for the default rendering behavior, or `high` to render the file at higher quality. Defaults to `low`.

~~2355~~

2356 - `"low"`

~~2357~~

2358 - `"high"`

~~2359~~

2360 - `file_data: Optional[str]`

~~2361~~

2362 The content of the file to be sent to the model.

~~2363~~

2364 - `file_id: Optional[str]`

~~2365~~

2366 The ID of the file to be sent to the model.

~~2367~~

2368 - `file_url: Optional[str]`

~~2369~~

2370 The URL of the file to be sent to the model.

~~2371~~

2372 - `filename: Optional[str]`

~~2373~~

2374 The name of the file to be sent to the model.

~~2375~~

2376 - `version: Optional[str]`

~~2377~~

2378 Optional version of the prompt template.

~~2379~~

2380 - `reasoning: Optional[RealtimeReasoning]`

~~2381~~

2382 Configuration for reasoning-capable Realtime models such as `gpt-realtime-2`.

~~2383~~

2384 - `effort: Optional[RealtimeReasoningEffort]`

~~2385~~

2386 Constrains effort on reasoning for reasoning-capable Realtime models such as

2387 `gpt-realtime-2`.

~~2388~~

2389 - `"minimal"`

~~2390~~

2391 - `"low"`

~~2392~~

2393 - `"medium"`

~~2394~~

2395 - `"high"`

~~2396~~

2397 - `"xhigh"`

~~2398~~

2399 - `tool_choice: Optional[ToolChoice]`

~~2400~~

2401 How the model chooses tools. Provide one of the string modes or force a specific

2402 function/MCP tool.

~~2403~~

2404 - `Literal["none", "auto", "required"]`

~~2405~~

2406 - `"none"`

~~2407~~

2408 - `"auto"`

~~2409~~

2410 - `"required"`

~~2411~~

2412 - `class ToolChoiceFunction: …`

~~2413~~

2414 Use this option to force the model to call a specific function.

~~2415~~

2416 - `name: str`

~~2417~~

2418 The name of the function to call.

~~2419~~

2420 - `type: Literal["function"]`

~~2421~~

2422 For function calling, the type is always `function`.

~~2423~~

2424 - `"function"`

~~2425~~

2426 - `class ToolChoiceMcp: …`

~~2427~~

2428 Use this option to force the model to call a specific tool on a remote MCP server.

~~2429~~

2430 - `server_label: str`

~~2431~~

2432 The label of the MCP server to use.

~~2433~~

2434 - `type: Literal["mcp"]`

~~2435~~

2436 For MCP tools, the type is always `mcp`.

~~2437~~

2438 - `"mcp"`

~~2439~~

2440 - `name: Optional[str]`

~~2441~~

2442 The name of the tool to call on the server.

~~2443~~

2444 - `tools: Optional[List[Tool]]`

~~2445~~

2446 Tools available to the model.

~~2447~~

2448 - `class RealtimeFunctionTool: …`

~~2449~~

2450 - `description: Optional[str]`

~~2451~~

2452 The description of the function, including guidance on when and how

2453 to call it, and guidance about what to tell the user when calling

2454 (if anything).

~~2455~~

2456 - `name: Optional[str]`

~~2457~~

2458 The name of the function.

~~2459~~

2460 - `parameters: Optional[object]`

~~2461~~

2462 Parameters of the function in JSON Schema.

~~2463~~

2464 - `type: Optional[Literal["function"]]`

~~2465~~

2466 The type of the tool, i.e. `function`.

~~2467~~

2468 - `"function"`

~~2469~~

2470 - `class ToolMcpTool: …`

~~2471~~

2472 Give the model access to additional tools via remote Model Context Protocol

2473 (MCP) servers. [Learn more about MCP](https://platform.openai.com/docs/guides/tools-remote-mcp).

~~2474~~

2475 - `server_label: str`

~~2476~~

2477 A label for this MCP server, used to identify it in tool calls.

~~2478~~

2479 - `type: Literal["mcp"]`

~~2480~~

2481 The type of the MCP tool. Always `mcp`.

~~2482~~

2483 - `"mcp"`

~~2484~~

2485 - `allowed_tools: Optional[ToolMcpToolAllowedTools]`

~~2486~~

2487 List of allowed tool names or a filter object.

~~2488~~

2489 - `List[str]`

~~2490~~

2491 A string array of allowed tool names

~~2492~~

2493 - `class ToolMcpToolAllowedToolsMcpToolFilter: …`

~~2494~~

2495 A filter object to specify which tools are allowed.

~~2496~~

2497 - `read_only: Optional[bool]`

~~2498~~

2499 Indicates whether or not a tool modifies data or is read-only. If an

2500 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

2501 it will match this filter.

~~2502~~

2503 - `tool_names: Optional[List[str]]`

~~2504~~

2505 List of allowed tool names.

~~2506~~

2507 - `authorization: Optional[str]`

~~2508~~

2509 An OAuth access token that can be used with a remote MCP server, either

2510 with a custom MCP server URL or a service connector. Your application

2511 must handle the OAuth authorization flow and provide the token here.

~~2512~~

2513 - `connector_id: Optional[Literal["connector_dropbox", "connector_gmail", "connector_googlecalendar", 5 more]]`

~~2514~~

2515 Identifier for service connectors, like those available in ChatGPT. One of

2516 `server_url` or `connector_id` must be provided. Learn more about service

2517 connectors [here](https://platform.openai.com/docs/guides/tools-remote-mcp#connectors).

~~2518~~

2519 Currently supported `connector_id` values are:

~~2520~~

2521 - Dropbox: `connector_dropbox`

2522 - Gmail: `connector_gmail`

2523 - Google Calendar: `connector_googlecalendar`

2524 - Google Drive: `connector_googledrive`

2525 - Microsoft Teams: `connector_microsoftteams`

2526 - Outlook Calendar: `connector_outlookcalendar`

2527 - Outlook Email: `connector_outlookemail`

2528 - SharePoint: `connector_sharepoint`

~~2529~~

2530 - `"connector_dropbox"`

~~2531~~

2532 - `"connector_gmail"`

~~2533~~

2534 - `"connector_googlecalendar"`

~~2535~~

2536 - `"connector_googledrive"`

~~2537~~

2538 - `"connector_microsoftteams"`

~~2539~~

2540 - `"connector_outlookcalendar"`

~~2541~~

2542 - `"connector_outlookemail"`

~~2543~~

2544 - `"connector_sharepoint"`

~~2545~~

2546 - `defer_loading: Optional[bool]`

~~2547~~

2548 Whether this MCP tool is deferred and discovered via tool search.

~~2549~~

2550 - `headers: Optional[Dict[str, str]]`

~~2551~~

2552 Optional HTTP headers to send to the MCP server. Use for authentication

2553 or other purposes.

~~2554~~

2555 - `require_approval: Optional[ToolMcpToolRequireApproval]`

~~2556~~

2557 Specify which of the MCP server's tools require approval.

~~2558~~

2559 - `class ToolMcpToolRequireApprovalMcpToolApprovalFilter: …`

~~2560~~

2561 Specify which of the MCP server's tools require approval. Can be

2562 `always`, `never`, or a filter object associated with tools

2563 that require approval.

~~2564~~

2565 - `always: Optional[ToolMcpToolRequireApprovalMcpToolApprovalFilterAlways]`

~~2566~~

2567 A filter object to specify which tools are allowed.

~~2568~~

2569 - `read_only: Optional[bool]`

~~2570~~

2571 Indicates whether or not a tool modifies data or is read-only. If an

2572 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

2573 it will match this filter.

~~2574~~

2575 - `tool_names: Optional[List[str]]`

~~2576~~

2577 List of allowed tool names.

~~2578~~

2579 - `never: Optional[ToolMcpToolRequireApprovalMcpToolApprovalFilterNever]`

~~2580~~

2581 A filter object to specify which tools are allowed.

~~2582~~

2583 - `read_only: Optional[bool]`

~~2584~~

2585 Indicates whether or not a tool modifies data or is read-only. If an

2586 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

2587 it will match this filter.

~~2588~~

2589 - `tool_names: Optional[List[str]]`

~~2590~~

2591 List of allowed tool names.

~~2592~~

2593 - `Literal["always", "never"]`

~~2594~~

2595 Specify a single approval policy for all tools. One of `always` or

2596 `never`. When set to `always`, all tools will require approval. When

2597 set to `never`, all tools will not require approval.

~~2598~~

2599 - `"always"`

~~2600~~

2601 - `"never"`

~~2602~~

2603 - `server_description: Optional[str]`

~~2604~~

2605 Optional description of the MCP server, used to provide more context.

~~2606~~

2607 - `server_url: Optional[str]`

~~2608~~

2609 The URL for the MCP server. One of `server_url` or `connector_id` must be

2610 provided.

~~2611~~

2612 - `tracing: Optional[Tracing]`

~~2613~~

2614 Realtime API can write session traces to the [Traces Dashboard](https://platform.openai.com/logs?api=traces). Set to null to disable tracing. Once

2615 tracing is enabled for a session, the configuration cannot be modified.

~~2616~~

2617 `auto` will create a trace for the session with default values for the

2618 workflow name, group id, and metadata.

~~2619~~

2620 - `Literal["auto"]`

~~2621~~

2622 Enables tracing and sets default values for tracing configuration options. Always `auto`.

~~2623~~

2624 - `"auto"`

~~2625~~

2626 - `class TracingTracingConfiguration: …`

~~2627~~

2628 Granular configuration for tracing.

~~2629~~

2630 - `group_id: Optional[str]`

~~2631~~

2632 The group id to attach to this trace to enable filtering and

2633 grouping in the Traces Dashboard.

~~2634~~

2635 - `metadata: Optional[object]`

~~2636~~

2637 The arbitrary metadata to attach to this trace to enable

2638 filtering in the Traces Dashboard.

~~2639~~

2640 - `workflow_name: Optional[str]`

~~2641~~

2642 The name of the workflow to attach to this trace. This is used to

2643 name the trace in the Traces Dashboard.

~~2644~~

2645 - `truncation: Optional[RealtimeTruncation]`

~~2646~~

2647 When the number of tokens in a conversation exceeds the model's input token limit, the conversation be truncated, meaning messages (starting from the oldest) will not be included in the model's context. A 32k context model with 4,096 max output tokens can only include 28,224 tokens in the context before truncation occurs.

~~2648~~

2649 Clients can configure truncation behavior to truncate with a lower max token limit, which is an effective way to control token usage and cost.

~~2650~~

2651 Truncation will reduce the number of cached tokens on the next turn (busting the cache), since messages are dropped from the beginning of the context. However, clients can also configure truncation to retain messages up to a fraction of the maximum context size, which will reduce the need for future truncations and thus improve the cache rate.

~~2652~~

2653 Truncation can be disabled entirely, which means the server will never truncate but would instead return an error if the conversation exceeds the model's input token limit.

~~2654~~

2655 - `Literal["auto", "disabled"]`

~~2656~~

2657 The truncation strategy to use for the session. `auto` is the default truncation strategy. `disabled` will disable truncation and emit errors when the conversation exceeds the input token limit.

~~2658~~

2659 - `"auto"`

~~2660~~

2661 - `"disabled"`

~~2662~~

2663 - `class RealtimeTruncationRetentionRatio: …`

~~2664~~

2665 Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.

~~2666~~

2667 - `retention_ratio: float`

~~2668~~

2669 Fraction of post-instruction conversation tokens to retain (`0.0` - `1.0`) when the conversation exceeds the input token limit. Setting this to `0.8` means that messages will be dropped until 80% of the maximum allowed tokens are used. This helps reduce the frequency of truncations and improve cache rates.

~~2670~~

2671 - `type: Literal["retention_ratio"]`

~~2672~~

2673 Use retention ratio truncation.

~~2674~~

2675 - `"retention_ratio"`

~~2676~~

2677 - `token_limits: Optional[TokenLimits]`

~~2678~~

2679 Optional custom token limits for this truncation strategy. If not provided, the model's default token limits will be used.

~~2680~~

2681 - `post_instructions: Optional[int]`

~~2682~~

2683 Maximum tokens allowed in the conversation after instructions (which including tool definitions). For example, setting this to 5,000 would mean that truncation would occur when the conversation exceeds 5,000 tokens after instructions. This cannot be higher than the model's context window size minus the maximum output tokens.

~~2684~~

2685### Realtime Transcription Session Create Response

~~2686~~

2687- `class RealtimeTranscriptionSessionCreateResponse: …`

~~2688~~

2689 A Realtime transcription session configuration object.

~~2690~~

2691 - `id: str`

~~2692~~

2693 Unique identifier for the session that looks like `sess_1234567890abcdef`.

~~2694~~

2695 - `object: str`

~~2696~~

2697 The object type. Always `realtime.transcription_session`.

~~2698~~

2699 - `type: Literal["transcription"]`

~~2700~~

2701 The type of session. Always `transcription` for transcription sessions.

~~2702~~

2703 - `"transcription"`

~~2704~~

2705 - `audio: Optional[Audio]`

~~2706~~

2707 Configuration for input audio for the session.

~~2708~~

2709 - `input: Optional[AudioInput]`

~~2710~~

2711 - `format: Optional[RealtimeAudioFormats]`

~~2712~~

2713 The PCM audio format. Only a 24kHz sample rate is supported.

~~2714~~

2715 - `class AudioPCM: …`

~~2716~~

2717 The PCM audio format. Only a 24kHz sample rate is supported.

~~2718~~

2719 - `rate: Optional[Literal[24000]]`

~~2720~~

2721 The sample rate of the audio. Always `24000`.

~~2722~~

2723 - `24000`

~~2724~~

2725 - `type: Optional[Literal["audio/pcm"]]`

~~2726~~

2727 The audio format. Always `audio/pcm`.

~~2728~~

2729 - `"audio/pcm"`

~~2730~~

2731 - `class AudioPCMU: …`

~~2732~~

2733 The G.711 μ-law format.

~~2734~~

2735 - `type: Optional[Literal["audio/pcmu"]]`

~~2736~~

2737 The audio format. Always `audio/pcmu`.

~~2738~~

2739 - `"audio/pcmu"`

~~2740~~

2741 - `class AudioPCMA: …`

~~2742~~

2743 The G.711 A-law format.

~~2744~~

2745 - `type: Optional[Literal["audio/pcma"]]`

~~2746~~

2747 The audio format. Always `audio/pcma`.

~~2748~~

2749 - `"audio/pcma"`

~~2750~~

2751 - `noise_reduction: Optional[AudioInputNoiseReduction]`

~~2752~~

2753 Configuration for input audio noise reduction.

~~2754~~

2755 - `type: Optional[NoiseReductionType]`

~~2756~~

2757 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.

~~2758~~

2759 - `"near_field"`

~~2760~~

2761 - `"far_field"`

~~2762~~

2763 - `transcription: Optional[AudioTranscription]`

~~2764~~

2765 - `delay: Optional[Literal["minimal", "low", "medium", 2 more]]`

~~2766~~

2767 Controls how long the model waits before emitting transcription text.

2768 Higher values can improve transcription accuracy at the cost of latency.

2769 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~2770~~

2771 - `"minimal"`

~~2772~~

2773 - `"low"`

~~2774~~

2775 - `"medium"`

~~2776~~

2777 - `"high"`

~~2778~~

2779 - `"xhigh"`

~~2780~~

2781 - `language: Optional[str]`

~~2782~~

2783 The language of the input audio. Supplying the input language in

2784 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format

2785 will improve accuracy and latency.

~~2786~~

2787 - `model: Optional[Union[str, Literal["whisper-1", "gpt-4o-mini-transcribe", "gpt-4o-mini-transcribe-2025-12-15", 3 more], null]]`

~~2788~~

2789 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~2790~~

2791 - `str`

~~2792~~

2793 - `Literal["whisper-1", "gpt-4o-mini-transcribe", "gpt-4o-mini-transcribe-2025-12-15", 3 more]`

~~2794~~

2795 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~2796~~

2797 - `"whisper-1"`

~~2798~~

2799 - `"gpt-4o-mini-transcribe"`

~~2800~~

2801 - `"gpt-4o-mini-transcribe-2025-12-15"`

~~2802~~

2803 - `"gpt-4o-transcribe"`

~~2804~~

2805 - `"gpt-4o-transcribe-diarize"`

~~2806~~

2807 - `"gpt-realtime-whisper"`

~~2808~~

2809 - `prompt: Optional[str]`

~~2810~~

2811 An optional text to guide the model's style or continue a previous audio

2812 segment.

2813 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).

2814 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".

2815 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~2816~~

2817 - `turn_detection: Optional[RealtimeTranscriptionSessionTurnDetection]`

~~2818~~

2819 Configuration for turn detection. Can be set to `null` to turn off. Server

2820 VAD means that the model will detect the start and end of speech based on

2821 audio volume and respond at the end of user speech. For `gpt-realtime-whisper`, this must be `null`; VAD is not supported.

~~2822~~

2823 - `prefix_padding_ms: Optional[int]`

~~2824~~

2825 Amount of audio to include before the VAD detected speech (in

2826 milliseconds). Defaults to 300ms.

~~2827~~

2828 - `silence_duration_ms: Optional[int]`

~~2829~~

2830 Duration of silence to detect speech stop (in milliseconds). Defaults

2831 to 500ms. With shorter values the model will respond more quickly,

2832 but may jump in on short pauses from the user.

~~2833~~

2834 - `threshold: Optional[float]`

~~2835~~

2836 Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

2837 higher threshold will require louder audio to activate the model, and

2838 thus might perform better in noisy environments.

~~2839~~

2840 - `type: Optional[str]`

~~2841~~

2842 Type of turn detection, only `server_vad` is currently supported.

~~2843~~

2844 - `expires_at: Optional[int]`

~~2845~~

2846 Expiration timestamp for the session, in seconds since epoch.

~~2847~~

2848 - `include: Optional[List[Literal["item.input_audio_transcription.logprobs"]]]`

~~2849~~

2850 Additional fields to include in server outputs.

~~2851~~

2852 - `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~2853~~

2854 - `"item.input_audio_transcription.logprobs"`

~~2855~~

2856### Realtime Transcription Session Turn Detection

~~2857~~

2858- `class RealtimeTranscriptionSessionTurnDetection: …`

~~2859~~

2860 Configuration for turn detection. Can be set to `null` to turn off. Server

2861 VAD means that the model will detect the start and end of speech based on

2862 audio volume and respond at the end of user speech. For `gpt-realtime-whisper`, this must be `null`; VAD is not supported.

~~2863~~

2864 - `prefix_padding_ms: Optional[int]`

~~2865~~

2866 Amount of audio to include before the VAD detected speech (in

2867 milliseconds). Defaults to 300ms.

~~2868~~

2869 - `silence_duration_ms: Optional[int]`

~~2870~~

2871 Duration of silence to detect speech stop (in milliseconds). Defaults

2872 to 500ms. With shorter values the model will respond more quickly,

2873 but may jump in on short pauses from the user.

~~2874~~

2875 - `threshold: Optional[float]`

~~2876~~

2877 Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

2878 higher threshold will require louder audio to activate the model, and

2879 thus might perform better in noisy environments.

~~2880~~

2881 - `type: Optional[str]`

~~2882~~

2883 Type of turn detection, only `server_vad` is currently supported.

~~2884~~

2885### Client Secret Create Response

~~2886~~

2887- `class ClientSecretCreateResponse: …`

~~2888~~

2889 Response from creating a session and client secret for the Realtime API.

~~2890~~

2891 - `expires_at: int`

~~2892~~

2893 Expiration timestamp for the client secret, in seconds since epoch.

~~2894~~

2895 - `session: Session`

~~2896~~

2897 The session configuration for either a realtime or transcription session.

~~2898~~

2899 - `class RealtimeSessionCreateResponse: …`

~~2900~~

2901 A Realtime session configuration object.

~~2902~~

2903 - `id: str`

~~2904~~

2905 Unique identifier for the session that looks like `sess_1234567890abcdef`.

~~2906~~

2907 - `object: Literal["realtime.session"]`

~~2908~~

2909 The object type. Always `realtime.session`.

~~2910~~

2911 - `"realtime.session"`

~~2912~~

2913 - `type: Literal["realtime"]`

~~2914~~

2915 The type of session to create. Always `realtime` for the Realtime API.

~~2916~~

2917 - `"realtime"`

~~2918~~

2919 - `audio: Optional[Audio]`

~~2920~~

2921 Configuration for input and output audio.

~~2922~~

2923 - `input: Optional[AudioInput]`

~~2924~~

2925 - `format: Optional[RealtimeAudioFormats]`

~~2926~~

2927 The format of the input audio.

~~2928~~

2929 - `class AudioPCM: …`

~~2930~~

2931 The PCM audio format. Only a 24kHz sample rate is supported.

~~2932~~

2933 - `rate: Optional[Literal[24000]]`

~~2934~~

2935 The sample rate of the audio. Always `24000`.

~~2936~~

2937 - `24000`

~~2938~~

2939 - `type: Optional[Literal["audio/pcm"]]`

~~2940~~

2941 The audio format. Always `audio/pcm`.

~~2942~~

2943 - `"audio/pcm"`

~~2944~~

2945 - `class AudioPCMU: …`

~~2946~~

2947 The G.711 μ-law format.

~~2948~~

2949 - `type: Optional[Literal["audio/pcmu"]]`

~~2950~~

2951 The audio format. Always `audio/pcmu`.

~~2952~~

2953 - `"audio/pcmu"`

~~2954~~

2955 - `class AudioPCMA: …`

~~2956~~

2957 The G.711 A-law format.

~~2958~~

2959 - `type: Optional[Literal["audio/pcma"]]`

~~2960~~

2961 The audio format. Always `audio/pcma`.

~~2962~~

2963 - `"audio/pcma"`

~~2964~~

2965 - `noise_reduction: Optional[AudioInputNoiseReduction]`

~~2966~~

2967 Configuration for input audio noise reduction. This can be set to `null` to turn off.

2968 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.

2969 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.

~~2970~~

2971 - `type: Optional[NoiseReductionType]`

~~2972~~

2973 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.

~~2974~~

2975 - `"near_field"`

~~2976~~

2977 - `"far_field"`

~~2978~~

2979 - `transcription: Optional[AudioTranscription]`

~~2980~~

2981 - `delay: Optional[Literal["minimal", "low", "medium", 2 more]]`

~~2982~~

2983 Controls how long the model waits before emitting transcription text.

2984 Higher values can improve transcription accuracy at the cost of latency.

2985 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~2986~~

2987 - `"minimal"`

~~2988~~

2989 - `"low"`

~~2990~~

2991 - `"medium"`

~~2992~~

2993 - `"high"`

~~2994~~

2995 - `"xhigh"`

~~2996~~

2997 - `language: Optional[str]`

~~2998~~

2999 The language of the input audio. Supplying the input language in

3000 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format

3001 will improve accuracy and latency.

~~3002~~

3003 - `model: Optional[Union[str, Literal["whisper-1", "gpt-4o-mini-transcribe", "gpt-4o-mini-transcribe-2025-12-15", 3 more], null]]`

~~3004~~

3005 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~3006~~

3007 - `str`

~~3008~~

3009 - `Literal["whisper-1", "gpt-4o-mini-transcribe", "gpt-4o-mini-transcribe-2025-12-15", 3 more]`

~~3010~~

3011 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~3012~~

3013 - `"whisper-1"`

~~3014~~

3015 - `"gpt-4o-mini-transcribe"`

~~3016~~

3017 - `"gpt-4o-mini-transcribe-2025-12-15"`

~~3018~~

3019 - `"gpt-4o-transcribe"`

~~3020~~

3021 - `"gpt-4o-transcribe-diarize"`

~~3022~~

3023 - `"gpt-realtime-whisper"`

~~3024~~

3025 - `prompt: Optional[str]`

~~3026~~

3027 An optional text to guide the model's style or continue a previous audio

3028 segment.

3029 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).

3030 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".

3031 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~3032~~

3033 - `turn_detection: Optional[AudioInputTurnDetection]`

~~3034~~

3035 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.

~~3036~~

3037 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.

~~3038~~

3039 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.

~~3040~~

3041 For `gpt-realtime-whisper` transcription sessions, turn detection must be

3042 set to `null`; VAD is not supported.

~~3043~~

3044 - `class AudioInputTurnDetectionServerVad: …`

~~3045~~

3046 Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.

~~3047~~

3048 - `type: Literal["server_vad"]`

~~3049~~

3050 Type of turn detection, `server_vad` to turn on simple Server VAD.

~~3051~~

3052 - `"server_vad"`

~~3053~~

3054 - `create_response: Optional[bool]`

~~3055~~

3056 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.

~~3057~~

3058 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~3059~~

3060 - `idle_timeout_ms: Optional[int]`

~~3061~~

3062 Optional timeout after which a model response will be triggered automatically. This is

3063 useful for situations in which a long pause from the user is unexpected, such as a phone

3064 call. The model will effectively prompt the user to continue the conversation based

3065 on the current context.

~~3066~~

3067 The timeout value will be applied after the last model response's audio has finished playing,

3068 i.e. it's set to the `response.done` time plus audio playback duration.

~~3069~~

3070 An `input_audio_buffer.timeout_triggered` event (plus events

3071 associated with the Response) will be emitted when the timeout is reached.

3072 Idle timeout is currently only supported for `server_vad` mode.

~~3073~~

3074 - `interrupt_response: Optional[bool]`

~~3075~~

3076 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default

3077 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.

~~3078~~

3079 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~3080~~

3081 - `prefix_padding_ms: Optional[int]`

~~3082~~

3083 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in

3084 milliseconds). Defaults to 300ms.

~~3085~~

3086 - `silence_duration_ms: Optional[int]`

~~3087~~

3088 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults

3089 to 500ms. With shorter values the model will respond more quickly,

3090 but may jump in on short pauses from the user.

~~3091~~

3092 - `threshold: Optional[float]`

~~3093~~

3094 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

3095 higher threshold will require louder audio to activate the model, and

3096 thus might perform better in noisy environments.

~~3097~~

3098 - `class AudioInputTurnDetectionSemanticVad: …`

~~3099~~

3100 Server-side semantic turn detection which uses a model to determine when the user has finished speaking.

~~3101~~

3102 - `type: Literal["semantic_vad"]`

~~3103~~

3104 Type of turn detection, `semantic_vad` to turn on Semantic VAD.

~~3105~~

3106 - `"semantic_vad"`

~~3107~~

3108 - `create_response: Optional[bool]`

~~3109~~

3110 Whether or not to automatically generate a response when a VAD stop event occurs.

~~3111~~

3112 - `eagerness: Optional[Literal["low", "medium", "high", "auto"]]`

~~3113~~

3114 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.

~~3115~~

3116 - `"low"`

~~3117~~

3118 - `"medium"`

~~3119~~

3120 - `"high"`

~~3121~~

3122 - `"auto"`

~~3123~~

3124 - `interrupt_response: Optional[bool]`

~~3125~~

3126 Whether or not to automatically interrupt any ongoing response with output to the default

3127 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.

~~3128~~

3129 - `output: Optional[AudioOutput]`

~~3130~~

3131 - `format: Optional[RealtimeAudioFormats]`

~~3132~~

3133 The format of the output audio.

~~3134~~

3135 - `speed: Optional[float]`

~~3136~~

3137 The speed of the model's spoken response as a multiple of the original speed.

3138 1.0 is the default speed. 0.25 is the minimum speed. 1.5 is the maximum speed. This value can only be changed in between model turns, not while a response is in progress.

~~3139~~

3140 This parameter is a post-processing adjustment to the audio after it is generated, it's

3141 also possible to prompt the model to speak faster or slower.

~~3142~~

3143 - `voice: Optional[Union[str, Literal["alloy", "ash", "ballad", 7 more], null]]`

~~3144~~

3145 The voice the model uses to respond. Voice cannot be changed during the

3146 session once the model has responded with audio at least once. Current

3147 voice options are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`,

3148 `shimmer`, `verse`, `marin`, and `cedar`. We recommend `marin` and `cedar` for

3149 best quality.

~~3150~~

3151 - `str`

~~3152~~

3153 - `Literal["alloy", "ash", "ballad", 7 more]`

~~3154~~

3155 The voice the model uses to respond. Voice cannot be changed during the

3156 session once the model has responded with audio at least once. Current

3157 voice options are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`,

3158 `shimmer`, `verse`, `marin`, and `cedar`. We recommend `marin` and `cedar` for

3159 best quality.

~~3160~~

3161 - `"alloy"`

~~3162~~

3163 - `"ash"`

~~3164~~

3165 - `"ballad"`

~~3166~~

3167 - `"coral"`

~~3168~~

3169 - `"echo"`

~~3170~~

3171 - `"sage"`

~~3172~~

3173 - `"shimmer"`

~~3174~~

3175 - `"verse"`

~~3176~~

3177 - `"marin"`

~~3178~~

3179 - `"cedar"`

~~3180~~

3181 - `expires_at: Optional[int]`

~~3182~~

3183 Expiration timestamp for the session, in seconds since epoch.

~~3184~~

3185 - `include: Optional[List[Literal["item.input_audio_transcription.logprobs"]]]`

~~3186~~

3187 Additional fields to include in server outputs.

~~3188~~

3189 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~3190~~

3191 - `"item.input_audio_transcription.logprobs"`

~~3192~~

3193 - `instructions: Optional[str]`

~~3194~~

3195 The default system instructions (i.e. system message) prepended to model calls. This field allows the client to guide the model on desired responses. The model can be instructed on response content and format, (e.g. "be extremely succinct", "act friendly", "here are examples of good responses") and on audio behavior (e.g. "talk quickly", "inject emotion into your voice", "laugh frequently"). The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behavior.

~~3196~~

3197 Note that the server sets default instructions which will be used if this field is not set and are visible in the `session.created` event at the start of the session.

~~3198~~

3199 - `max_output_tokens: Optional[Union[int, Literal["inf"], null]]`

~~3200~~

3201 Maximum number of output tokens for a single assistant response,

3202 inclusive of tool calls. Provide an integer between 1 and 4096 to

3203 limit output tokens, or `inf` for the maximum available tokens for a

3204 given model. Defaults to `inf`.

~~3205~~

3206 - `int`

~~3207~~

3208 - `Literal["inf"]`

~~3209~~

3210 - `"inf"`

~~3211~~

3212 - `model: Optional[Union[str, Literal["gpt-realtime", "gpt-realtime-1.5", "gpt-realtime-2", 14 more], null]]`

~~3213~~

3214 The Realtime model used for this session.

~~3215~~

3216 - `str`

~~3217~~

3218 - `Literal["gpt-realtime", "gpt-realtime-1.5", "gpt-realtime-2", 14 more]`

~~3219~~

3220 The Realtime model used for this session.

~~3221~~

3222 - `"gpt-realtime"`

~~3223~~

3224 - `"gpt-realtime-1.5"`

~~3225~~

3226 - `"gpt-realtime-2"`

~~3227~~

3228 - `"gpt-realtime-2025-08-28"`

~~3229~~

3230 - `"gpt-4o-realtime-preview"`

~~3231~~

3232 - `"gpt-4o-realtime-preview-2024-10-01"`

~~3233~~

3234 - `"gpt-4o-realtime-preview-2024-12-17"`

~~3235~~

3236 - `"gpt-4o-realtime-preview-2025-06-03"`

~~3237~~

3238 - `"gpt-4o-mini-realtime-preview"`

~~3239~~

3240 - `"gpt-4o-mini-realtime-preview-2024-12-17"`

~~3241~~

3242 - `"gpt-realtime-mini"`

~~3243~~

3244 - `"gpt-realtime-mini-2025-10-06"`

~~3245~~

3246 - `"gpt-realtime-mini-2025-12-15"`

~~3247~~

3248 - `"gpt-audio-1.5"`

~~3249~~

3250 - `"gpt-audio-mini"`

~~3251~~

3252 - `"gpt-audio-mini-2025-10-06"`

~~3253~~

3254 - `"gpt-audio-mini-2025-12-15"`

~~3255~~

3256 - `output_modalities: Optional[List[Literal["text", "audio"]]]`

~~3257~~

3258 The set of modalities the model can respond with. It defaults to `["audio"]`, indicating

3259 that the model will respond with audio plus a transcript. `["text"]` can be used to make

3260 the model respond with text only. It is not possible to request both `text` and `audio` at the same time.

~~3261~~

3262 - `"text"`

~~3263~~

3264 - `"audio"`

~~3265~~

3266 - `prompt: Optional[ResponsePrompt]`

~~3267~~

3268 Reference to a prompt template and its variables.

3269 [Learn more](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts).

~~3270~~

3271 - `id: str`

~~3272~~

3273 The unique identifier of the prompt template to use.

~~3274~~

3275 - `variables: Optional[Dict[str, Variables]]`

~~3276~~

3277 Optional map of values to substitute in for variables in your

3278 prompt. The substitution values can either be strings, or other

3279 Response input types like images or files.

~~3280~~

3281 - `str`

~~3282~~

3283 - `class ResponseInputText: …`

~~3284~~

3285 A text input to the model.

~~3286~~

3287 - `text: str`

~~3288~~

3289 The text input to the model.

~~3290~~

3291 - `type: Literal["input_text"]`

~~3292~~

3293 The type of the input item. Always `input_text`.

~~3294~~

3295 - `"input_text"`

~~3296~~

3297 - `class ResponseInputImage: …`

~~3298~~

3299 An image input to the model. Learn about [image inputs](https://platform.openai.com/docs/guides/vision).

~~3300~~

3301 - `detail: Literal["low", "high", "auto", "original"]`

~~3302~~

3303 The detail level of the image to be sent to the model. One of `high`, `low`, `auto`, or `original`. Defaults to `auto`.

~~3304~~

3305 - `"low"`

~~3306~~

3307 - `"high"`

~~3308~~

3309 - `"auto"`

~~3310~~

3311 - `"original"`

~~3312~~

3313 - `type: Literal["input_image"]`

~~3314~~

3315 The type of the input item. Always `input_image`.

~~3316~~

3317 - `"input_image"`

~~3318~~

3319 - `file_id: Optional[str]`

~~3320~~

3321 The ID of the file to be sent to the model.

~~3322~~

3323 - `image_url: Optional[str]`

~~3324~~

3325 The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.

~~3326~~

3327 - `class ResponseInputFile: …`

~~3328~~

3329 A file input to the model.

~~3330~~

3331 - `type: Literal["input_file"]`

~~3332~~

3333 The type of the input item. Always `input_file`.

~~3334~~

3335 - `"input_file"`

~~3336~~

3337 - `detail: Optional[Literal["low", "high"]]`

~~3338~~

3339 The detail level of the file to be sent to the model. Use `low` for the default rendering behavior, or `high` to render the file at higher quality. Defaults to `low`.

~~3340~~

3341 - `"low"`

~~3342~~

3343 - `"high"`

~~3344~~

3345 - `file_data: Optional[str]`

~~3346~~

3347 The content of the file to be sent to the model.

~~3348~~

3349 - `file_id: Optional[str]`

~~3350~~

3351 The ID of the file to be sent to the model.

~~3352~~

3353 - `file_url: Optional[str]`

~~3354~~

3355 The URL of the file to be sent to the model.

~~3356~~

3357 - `filename: Optional[str]`

~~3358~~

3359 The name of the file to be sent to the model.

~~3360~~

3361 - `version: Optional[str]`

~~3362~~

3363 Optional version of the prompt template.

~~3364~~

3365 - `reasoning: Optional[RealtimeReasoning]`

~~3366~~

3367 Configuration for reasoning-capable Realtime models such as `gpt-realtime-2`.

~~3368~~

3369 - `effort: Optional[RealtimeReasoningEffort]`

~~3370~~

3371 Constrains effort on reasoning for reasoning-capable Realtime models such as

3372 `gpt-realtime-2`.

~~3373~~

3374 - `"minimal"`

~~3375~~

3376 - `"low"`

~~3377~~

3378 - `"medium"`

~~3379~~

3380 - `"high"`

~~3381~~

3382 - `"xhigh"`

~~3383~~

3384 - `tool_choice: Optional[ToolChoice]`

~~3385~~

3386 How the model chooses tools. Provide one of the string modes or force a specific

3387 function/MCP tool.

~~3388~~

3389 - `Literal["none", "auto", "required"]`

~~3390~~

3391 - `"none"`

~~3392~~

3393 - `"auto"`

~~3394~~

3395 - `"required"`

~~3396~~

3397 - `class ToolChoiceFunction: …`

~~3398~~

3399 Use this option to force the model to call a specific function.

~~3400~~

3401 - `name: str`

~~3402~~

3403 The name of the function to call.

~~3404~~

3405 - `type: Literal["function"]`

~~3406~~

3407 For function calling, the type is always `function`.

~~3408~~

3409 - `"function"`

~~3410~~

3411 - `class ToolChoiceMcp: …`

~~3412~~

3413 Use this option to force the model to call a specific tool on a remote MCP server.

~~3414~~

3415 - `server_label: str`

~~3416~~

3417 The label of the MCP server to use.

~~3418~~

3419 - `type: Literal["mcp"]`

~~3420~~

3421 For MCP tools, the type is always `mcp`.

~~3422~~

3423 - `"mcp"`

~~3424~~

3425 - `name: Optional[str]`

~~3426~~

3427 The name of the tool to call on the server.

~~3428~~

3429 - `tools: Optional[List[Tool]]`

~~3430~~

3431 Tools available to the model.

~~3432~~

3433 - `class RealtimeFunctionTool: …`

~~3434~~

3435 - `description: Optional[str]`

~~3436~~

3437 The description of the function, including guidance on when and how

3438 to call it, and guidance about what to tell the user when calling

3439 (if anything).

~~3440~~

3441 - `name: Optional[str]`

~~3442~~

3443 The name of the function.

~~3444~~

3445 - `parameters: Optional[object]`

~~3446~~

3447 Parameters of the function in JSON Schema.

~~3448~~

3449 - `type: Optional[Literal["function"]]`

~~3450~~

3451 The type of the tool, i.e. `function`.

~~3452~~

3453 - `"function"`

~~3454~~

3455 - `class ToolMcpTool: …`

~~3456~~

3457 Give the model access to additional tools via remote Model Context Protocol

3458 (MCP) servers. [Learn more about MCP](https://platform.openai.com/docs/guides/tools-remote-mcp).

~~3459~~

3460 - `server_label: str`

~~3461~~

3462 A label for this MCP server, used to identify it in tool calls.

~~3463~~

3464 - `type: Literal["mcp"]`

~~3465~~

3466 The type of the MCP tool. Always `mcp`.

~~3467~~

3468 - `"mcp"`

~~3469~~

3470 - `allowed_tools: Optional[ToolMcpToolAllowedTools]`

~~3471~~

3472 List of allowed tool names or a filter object.

~~3473~~

3474 - `List[str]`

~~3475~~

3476 A string array of allowed tool names

~~3477~~

3478 - `class ToolMcpToolAllowedToolsMcpToolFilter: …`

~~3479~~

3480 A filter object to specify which tools are allowed.

~~3481~~

3482 - `read_only: Optional[bool]`

~~3483~~

3484 Indicates whether or not a tool modifies data or is read-only. If an

3485 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

3486 it will match this filter.

~~3487~~

3488 - `tool_names: Optional[List[str]]`

~~3489~~

3490 List of allowed tool names.

~~3491~~

3492 - `authorization: Optional[str]`

~~3493~~

3494 An OAuth access token that can be used with a remote MCP server, either

3495 with a custom MCP server URL or a service connector. Your application

3496 must handle the OAuth authorization flow and provide the token here.

~~3497~~

3498 - `connector_id: Optional[Literal["connector_dropbox", "connector_gmail", "connector_googlecalendar", 5 more]]`

~~3499~~

3500 Identifier for service connectors, like those available in ChatGPT. One of

3501 `server_url` or `connector_id` must be provided. Learn more about service

3502 connectors [here](https://platform.openai.com/docs/guides/tools-remote-mcp#connectors).

~~3503~~

3504 Currently supported `connector_id` values are:

~~3505~~

3506 - Dropbox: `connector_dropbox`

3507 - Gmail: `connector_gmail`

3508 - Google Calendar: `connector_googlecalendar`

3509 - Google Drive: `connector_googledrive`

3510 - Microsoft Teams: `connector_microsoftteams`

3511 - Outlook Calendar: `connector_outlookcalendar`

3512 - Outlook Email: `connector_outlookemail`

3513 - SharePoint: `connector_sharepoint`

~~3514~~

3515 - `"connector_dropbox"`

~~3516~~

3517 - `"connector_gmail"`

~~3518~~

3519 - `"connector_googlecalendar"`

~~3520~~

3521 - `"connector_googledrive"`

~~3522~~

3523 - `"connector_microsoftteams"`

~~3524~~

3525 - `"connector_outlookcalendar"`

~~3526~~

3527 - `"connector_outlookemail"`

~~3528~~

3529 - `"connector_sharepoint"`

~~3530~~

3531 - `defer_loading: Optional[bool]`

~~3532~~

3533 Whether this MCP tool is deferred and discovered via tool search.

~~3534~~

3535 - `headers: Optional[Dict[str, str]]`

~~3536~~

3537 Optional HTTP headers to send to the MCP server. Use for authentication

3538 or other purposes.

~~3539~~

3540 - `require_approval: Optional[ToolMcpToolRequireApproval]`

~~3541~~

3542 Specify which of the MCP server's tools require approval.

~~3543~~

3544 - `class ToolMcpToolRequireApprovalMcpToolApprovalFilter: …`

~~3545~~

3546 Specify which of the MCP server's tools require approval. Can be

3547 `always`, `never`, or a filter object associated with tools

3548 that require approval.

~~3549~~

3550 - `always: Optional[ToolMcpToolRequireApprovalMcpToolApprovalFilterAlways]`

~~3551~~

3552 A filter object to specify which tools are allowed.

~~3553~~

3554 - `read_only: Optional[bool]`

~~3555~~

3556 Indicates whether or not a tool modifies data or is read-only. If an

3557 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

3558 it will match this filter.

~~3559~~

3560 - `tool_names: Optional[List[str]]`

~~3561~~

3562 List of allowed tool names.

~~3563~~

3564 - `never: Optional[ToolMcpToolRequireApprovalMcpToolApprovalFilterNever]`

~~3565~~

3566 A filter object to specify which tools are allowed.

~~3567~~

3568 - `read_only: Optional[bool]`

~~3569~~

3570 Indicates whether or not a tool modifies data or is read-only. If an

3571 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

3572 it will match this filter.

~~3573~~

3574 - `tool_names: Optional[List[str]]`

~~3575~~

3576 List of allowed tool names.

~~3577~~

3578 - `Literal["always", "never"]`

~~3579~~

3580 Specify a single approval policy for all tools. One of `always` or

3581 `never`. When set to `always`, all tools will require approval. When

3582 set to `never`, all tools will not require approval.

~~3583~~

3584 - `"always"`

~~3585~~

3586 - `"never"`

~~3587~~

3588 - `server_description: Optional[str]`

~~3589~~

3590 Optional description of the MCP server, used to provide more context.

~~3591~~

3592 - `server_url: Optional[str]`

~~3593~~

3594 The URL for the MCP server. One of `server_url` or `connector_id` must be

3595 provided.

~~3596~~

3597 - `tracing: Optional[Tracing]`

~~3598~~

3599 Realtime API can write session traces to the [Traces Dashboard](https://platform.openai.com/logs?api=traces). Set to null to disable tracing. Once

3600 tracing is enabled for a session, the configuration cannot be modified.

~~3601~~

3602 `auto` will create a trace for the session with default values for the

3603 workflow name, group id, and metadata.

~~3604~~

3605 - `Literal["auto"]`

~~3606~~

3607 Enables tracing and sets default values for tracing configuration options. Always `auto`.

~~3608~~

3609 - `"auto"`

~~3610~~

3611 - `class TracingTracingConfiguration: …`

~~3612~~

3613 Granular configuration for tracing.

~~3614~~

3615 - `group_id: Optional[str]`

~~3616~~

3617 The group id to attach to this trace to enable filtering and

3618 grouping in the Traces Dashboard.

~~3619~~

3620 - `metadata: Optional[object]`

~~3621~~

3622 The arbitrary metadata to attach to this trace to enable

3623 filtering in the Traces Dashboard.

~~3624~~

3625 - `workflow_name: Optional[str]`

~~3626~~

3627 The name of the workflow to attach to this trace. This is used to

3628 name the trace in the Traces Dashboard.

~~3629~~

3630 - `truncation: Optional[RealtimeTruncation]`

~~3631~~

3632 When the number of tokens in a conversation exceeds the model's input token limit, the conversation be truncated, meaning messages (starting from the oldest) will not be included in the model's context. A 32k context model with 4,096 max output tokens can only include 28,224 tokens in the context before truncation occurs.

~~3633~~

3634 Clients can configure truncation behavior to truncate with a lower max token limit, which is an effective way to control token usage and cost.

~~3635~~

3636 Truncation will reduce the number of cached tokens on the next turn (busting the cache), since messages are dropped from the beginning of the context. However, clients can also configure truncation to retain messages up to a fraction of the maximum context size, which will reduce the need for future truncations and thus improve the cache rate.

~~3637~~

3638 Truncation can be disabled entirely, which means the server will never truncate but would instead return an error if the conversation exceeds the model's input token limit.

~~3639~~

3640 - `Literal["auto", "disabled"]`

~~3641~~

3642 The truncation strategy to use for the session. `auto` is the default truncation strategy. `disabled` will disable truncation and emit errors when the conversation exceeds the input token limit.

~~3643~~

3644 - `"auto"`

~~3645~~

3646 - `"disabled"`

~~3647~~

3648 - `class RealtimeTruncationRetentionRatio: …`

~~3649~~

3650 Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.

~~3651~~

3652 - `retention_ratio: float`

~~3653~~

3654 Fraction of post-instruction conversation tokens to retain (`0.0` - `1.0`) when the conversation exceeds the input token limit. Setting this to `0.8` means that messages will be dropped until 80% of the maximum allowed tokens are used. This helps reduce the frequency of truncations and improve cache rates.

~~3655~~

3656 - `type: Literal["retention_ratio"]`

~~3657~~

3658 Use retention ratio truncation.

~~3659~~

3660 - `"retention_ratio"`

~~3661~~

3662 - `token_limits: Optional[TokenLimits]`

~~3663~~

3664 Optional custom token limits for this truncation strategy. If not provided, the model's default token limits will be used.

~~3665~~

3666 - `post_instructions: Optional[int]`

~~3667~~

3668 Maximum tokens allowed in the conversation after instructions (which including tool definitions). For example, setting this to 5,000 would mean that truncation would occur when the conversation exceeds 5,000 tokens after instructions. This cannot be higher than the model's context window size minus the maximum output tokens.

~~3669~~

3670 - `class RealtimeTranscriptionSessionCreateResponse: …`

~~3671~~

3672 A Realtime transcription session configuration object.

~~3673~~

3674 - `id: str`

~~3675~~

3676 Unique identifier for the session that looks like `sess_1234567890abcdef`.

~~3677~~

3678 - `object: str`

~~3679~~

3680 The object type. Always `realtime.transcription_session`.

~~3681~~

3682 - `type: Literal["transcription"]`

~~3683~~

3684 The type of session. Always `transcription` for transcription sessions.

~~3685~~

3686 - `"transcription"`

~~3687~~

3688 - `audio: Optional[Audio]`

~~3689~~

3690 Configuration for input audio for the session.

~~3691~~

3692 - `input: Optional[AudioInput]`

~~3693~~

3694 - `format: Optional[RealtimeAudioFormats]`

~~3695~~

3696 The PCM audio format. Only a 24kHz sample rate is supported.

~~3697~~

3698 - `noise_reduction: Optional[AudioInputNoiseReduction]`

~~3699~~

3700 Configuration for input audio noise reduction.

~~3701~~

3702 - `type: Optional[NoiseReductionType]`

~~3703~~

3704 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.

~~3705~~

3706 - `transcription: Optional[AudioTranscription]`

~~3707~~

3708 - `turn_detection: Optional[RealtimeTranscriptionSessionTurnDetection]`

~~3709~~

3710 Configuration for turn detection. Can be set to `null` to turn off. Server

3711 VAD means that the model will detect the start and end of speech based on

3712 audio volume and respond at the end of user speech. For `gpt-realtime-whisper`, this must be `null`; VAD is not supported.

~~3713~~

3714 - `prefix_padding_ms: Optional[int]`

~~3715~~

3716 Amount of audio to include before the VAD detected speech (in

3717 milliseconds). Defaults to 300ms.

~~3718~~

3719 - `silence_duration_ms: Optional[int]`

~~3720~~

3721 Duration of silence to detect speech stop (in milliseconds). Defaults

3722 to 500ms. With shorter values the model will respond more quickly,

3723 but may jump in on short pauses from the user.

~~3724~~

3725 - `threshold: Optional[float]`

~~3726~~

3727 Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

3728 higher threshold will require louder audio to activate the model, and

3729 thus might perform better in noisy environments.

~~3730~~

3731 - `type: Optional[str]`

~~3732~~

3733 Type of turn detection, only `server_vad` is currently supported.

~~3734~~

3735 - `expires_at: Optional[int]`

~~3736~~

3737 Expiration timestamp for the session, in seconds since epoch.

~~3738~~

3739 - `include: Optional[List[Literal["item.input_audio_transcription.logprobs"]]]`

~~3740~~

3741 Additional fields to include in server outputs.

~~3742~~

3743 - `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~3744~~

3745 - `"item.input_audio_transcription.logprobs"`

~~3746~~

3747 - `value: str`

~~3748~~

3749 The generated client secret value.