Go Premium Account

Spybara
Companies
Openai
Api
Reference Changes, 2026-05-18 22:01 UTC to 2026-05-19 06:34 UTC
cli/resources/realtime/subresources/client_secrets/index.md

cli/resources/realtime/subresources/client_secrets/index.md 2026-05-18 22:01 UTC to 2026-05-19 06:34 UTC

0 added, 1938 removed.

2026

Wed 27 06:42 Fri 22 06:33 Wed 20 06:35 Tue 19 06:34 Mon 18 22:01 Mon 11 18:00 Thu 7 21:57 Tue 5 23:00 Sat 2 05:57

This document has no rendered page for this history range.

cli/resources/realtime/subresources/client_secrets/index.md +0 −1938 deleted

File Deleted View Diff

~~1# Client Secrets~~

~~3## Create client secret~~

~~5`$ openai realtime:client-secrets create`~~

~~7**post** `/realtime/client_secrets`~~

~~9Create a Realtime client secret with an associated session configuration.~~

~~11Client secrets are short-lived tokens that can be passed to a client app,~~

~~12such as a web frontend or mobile client, which grants access to the Realtime API without~~

~~13leaking your main API key. You can configure a custom TTL for each client secret.~~

~~15You can also attach session configuration options to the client secret, which will be~~

~~16applied to any sessions created using that client secret, but these can also be overridden~~

~~17by the client connection.~~

~~19[Learn more about authentication with client secrets over WebRTC](https://platform.openai.com/docs/guides/realtime-webrtc).~~

~~21Returns the created client secret and the effective session object. The client secret is a string that looks like `ek_1234`.~~

~~23### Parameters~~

~~25- `--expires-after: optional object { anchor, seconds }`~~

~~27 Configuration for the client secret expiration. Expiration refers to the time after which~~

~~28 a client secret will no longer be valid for creating sessions. The session itself may~~

~~29 continue after that time once started. A secret can be used to create multiple sessions~~

~~30 until it expires.~~

~~32- `--session: optional RealtimeSessionCreateRequest or RealtimeTranscriptionSessionCreateRequest`~~

~~34 Session configuration to use for the client secret. Choose either a realtime~~

~~35 session or a transcription session.~~

~~37### Returns~~

~~39- `ClientSecretNewResponse: object { expires_at, session, value }`~~

~~41 Response from creating a session and client secret for the Realtime API.~~

~~43 - `expires_at: number`~~

~~45 Expiration timestamp for the client secret, in seconds since epoch.~~

~~47 - `session: RealtimeSessionCreateResponse or RealtimeTranscriptionSessionCreateResponse`~~

~~49 The session configuration for either a realtime or transcription session.~~

~~51 - `realtime_session_create_response: object { id, object, type, 13 more }`~~

~~53 A Realtime session configuration object.~~

~~55 - `id: string`~~

~~57 Unique identifier for the session that looks like `sess_1234567890abcdef`.~~

~~59 - `object: "realtime.session"`~~

~~61 The object type. Always `realtime.session`.~~

~~63 - `type: "realtime"`~~

~~65 The type of session to create. Always `realtime` for the Realtime API.~~

~~67 - `audio: optional object { input, output }`~~

~~69 Configuration for input and output audio.~~

~~71 - `input: optional object { format, noise_reduction, transcription, turn_detection }`~~

~~73 - `format: optional object { rate, type } or object { type } or object { type }`~~

~~75 The format of the input audio.~~

~~77 - `audio/pcm: object { rate, type }`~~

~~79 The PCM audio format. Only a 24kHz sample rate is supported.~~

~~81 - `rate: optional 24000`~~

~~83 The sample rate of the audio. Always `24000`.~~

~~85 - `24000`~~

~~87 - `type: optional "audio/pcm"`~~

~~89 The audio format. Always `audio/pcm`.~~

~~91 - `"audio/pcm"`~~

~~93 - `audio/pcmu: object { type }`~~

~~95 The G.711 μ-law format.~~

~~97 - `type: optional "audio/pcmu"`~~

~~99 The audio format. Always `audio/pcmu`.~~

~~100~~

101 - `"audio/pcmu"`

~~102~~

103 - `audio/pcma: object { type }`

~~104~~

105 The G.711 A-law format.

~~106~~

107 - `type: optional "audio/pcma"`

~~108~~

109 The audio format. Always `audio/pcma`.

~~110~~

111 - `"audio/pcma"`

~~112~~

113 - `noise_reduction: optional object { type }`

~~114~~

115 Configuration for input audio noise reduction. This can be set to `null` to turn off.

116 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.

117 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.

~~118~~

119 - `type: optional "near_field" or "far_field"`

~~120~~

121 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.

~~122~~

123 - `"near_field"`

~~124~~

125 - `"far_field"`

~~126~~

127 - `transcription: optional object { delay, language, model, prompt }`

~~128~~

129 - `delay: optional "minimal" or "low" or "medium" or 2 more`

~~130~~

131 Controls how long the model waits before emitting transcription text.

132 Higher values can improve transcription accuracy at the cost of latency.

133 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~134~~

135 - `"minimal"`

~~136~~

137 - `"low"`

~~138~~

139 - `"medium"`

~~140~~

141 - `"high"`

~~142~~

143 - `"xhigh"`

~~144~~

145 - `language: optional string`

~~146~~

147 The language of the input audio. Supplying the input language in

148 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format

149 will improve accuracy and latency.

~~150~~

151 - `model: optional string or "whisper-1" or "gpt-4o-mini-transcribe" or "gpt-4o-mini-transcribe-2025-12-15" or 3 more`

~~152~~

153 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~154~~

155 - `"whisper-1"`

~~156~~

157 - `"gpt-4o-mini-transcribe"`

~~158~~

159 - `"gpt-4o-mini-transcribe-2025-12-15"`

~~160~~

161 - `"gpt-4o-transcribe"`

~~162~~

163 - `"gpt-4o-transcribe-diarize"`

~~164~~

165 - `"gpt-realtime-whisper"`

~~166~~

167 - `prompt: optional string`

~~168~~

169 An optional text to guide the model's style or continue a previous audio

170 segment.

171 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).

172 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".

173 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~174~~

175 - `turn_detection: optional object { type, create_response, idle_timeout_ms, 4 more } or object { type, create_response, eagerness, interrupt_response }`

~~176~~

177 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.

~~178~~

179 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.

~~180~~

181 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.

~~182~~

183 For `gpt-realtime-whisper` transcription sessions, turn detection must be

184 set to `null`; VAD is not supported.

~~185~~

186 - `server_vad: object { type, create_response, idle_timeout_ms, 4 more }`

~~187~~

188 Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.

~~189~~

190 - `type: "server_vad"`

~~191~~

192 Type of turn detection, `server_vad` to turn on simple Server VAD.

~~193~~

194 - `create_response: optional boolean`

~~195~~

196 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.

~~197~~

198 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~199~~

200 - `idle_timeout_ms: optional number`

~~201~~

202 Optional timeout after which a model response will be triggered automatically. This is

203 useful for situations in which a long pause from the user is unexpected, such as a phone

204 call. The model will effectively prompt the user to continue the conversation based

205 on the current context.

~~206~~

207 The timeout value will be applied after the last model response's audio has finished playing,

208 i.e. it's set to the `response.done` time plus audio playback duration.

~~209~~

210 An `input_audio_buffer.timeout_triggered` event (plus events

211 associated with the Response) will be emitted when the timeout is reached.

212 Idle timeout is currently only supported for `server_vad` mode.

~~213~~

214 - `interrupt_response: optional boolean`

~~215~~

216 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default

217 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.

~~218~~

219 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~220~~

221 - `prefix_padding_ms: optional number`

~~222~~

223 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in

224 milliseconds). Defaults to 300ms.

~~225~~

226 - `silence_duration_ms: optional number`

~~227~~

228 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults

229 to 500ms. With shorter values the model will respond more quickly,

230 but may jump in on short pauses from the user.

~~231~~

232 - `threshold: optional number`

~~233~~

234 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

235 higher threshold will require louder audio to activate the model, and

236 thus might perform better in noisy environments.

~~237~~

238 - `semantic_vad: object { type, create_response, eagerness, interrupt_response }`

~~239~~

240 Server-side semantic turn detection which uses a model to determine when the user has finished speaking.

~~241~~

242 - `type: "semantic_vad"`

~~243~~

244 Type of turn detection, `semantic_vad` to turn on Semantic VAD.

~~245~~

246 - `create_response: optional boolean`

~~247~~

248 Whether or not to automatically generate a response when a VAD stop event occurs.

~~249~~

250 - `eagerness: optional "low" or "medium" or "high" or "auto"`

~~251~~

252 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.

~~253~~

254 - `"low"`

~~255~~

256 - `"medium"`

~~257~~

258 - `"high"`

~~259~~

260 - `"auto"`

~~261~~

262 - `interrupt_response: optional boolean`

~~263~~

264 Whether or not to automatically interrupt any ongoing response with output to the default

265 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.

~~266~~

267 - `output: optional object { format, speed, voice }`

~~268~~

269 - `format: optional object { rate, type } or object { type } or object { type }`

~~270~~

271 The format of the output audio.

~~272~~

273 - `audio/pcm: object { rate, type }`

~~274~~

275 The PCM audio format. Only a 24kHz sample rate is supported.

~~276~~

277 - `audio/pcmu: object { type }`

~~278~~

279 The G.711 μ-law format.

~~280~~

281 - `audio/pcma: object { type }`

~~282~~

283 The G.711 A-law format.

~~284~~

285 - `speed: optional number`

~~286~~

287 The speed of the model's spoken response as a multiple of the original speed.

288 1.0 is the default speed. 0.25 is the minimum speed. 1.5 is the maximum speed. This value can only be changed in between model turns, not while a response is in progress.

~~289~~

290 This parameter is a post-processing adjustment to the audio after it is generated, it's

291 also possible to prompt the model to speak faster or slower.

~~292~~

293 - `voice: optional string or "alloy" or "ash" or "ballad" or 7 more`

~~294~~

295 The voice the model uses to respond. Voice cannot be changed during the

296 session once the model has responded with audio at least once. Current

297 voice options are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`,

298 `shimmer`, `verse`, `marin`, and `cedar`. We recommend `marin` and `cedar` for

299 best quality.

~~300~~

301 - `"alloy"`

~~302~~

303 - `"ash"`

~~304~~

305 - `"ballad"`

~~306~~

307 - `"coral"`

~~308~~

309 - `"echo"`

~~310~~

311 - `"sage"`

~~312~~

313 - `"shimmer"`

~~314~~

315 - `"verse"`

~~316~~

317 - `"marin"`

~~318~~

319 - `"cedar"`

~~320~~

321 - `expires_at: optional number`

~~322~~

323 Expiration timestamp for the session, in seconds since epoch.

~~324~~

325 - `include: optional array of "item.input_audio_transcription.logprobs"`

~~326~~

327 Additional fields to include in server outputs.

~~328~~

329 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~330~~

331 - `"item.input_audio_transcription.logprobs"`

~~332~~

333 - `instructions: optional string`

~~334~~

335 The default system instructions (i.e. system message) prepended to model calls. This field allows the client to guide the model on desired responses. The model can be instructed on response content and format, (e.g. "be extremely succinct", "act friendly", "here are examples of good responses") and on audio behavior (e.g. "talk quickly", "inject emotion into your voice", "laugh frequently"). The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behavior.

~~336~~

337 Note that the server sets default instructions which will be used if this field is not set and are visible in the `session.created` event at the start of the session.

~~338~~

339 - `max_output_tokens: optional number or "inf"`

~~340~~

341 Maximum number of output tokens for a single assistant response,

342 inclusive of tool calls. Provide an integer between 1 and 4096 to

343 limit output tokens, or `inf` for the maximum available tokens for a

344 given model. Defaults to `inf`.

~~345~~

346 - `union_member_0: number`

~~347~~

348 - `union_member_1: "inf"`

~~349~~

350 - `model: optional string or "gpt-realtime" or "gpt-realtime-1.5" or "gpt-realtime-2" or 14 more`

~~351~~

352 The Realtime model used for this session.

~~353~~

354 - `"gpt-realtime"`

~~355~~

356 - `"gpt-realtime-1.5"`

~~357~~

358 - `"gpt-realtime-2"`

~~359~~

360 - `"gpt-realtime-2025-08-28"`

~~361~~

362 - `"gpt-4o-realtime-preview"`

~~363~~

364 - `"gpt-4o-realtime-preview-2024-10-01"`

~~365~~

366 - `"gpt-4o-realtime-preview-2024-12-17"`

~~367~~

368 - `"gpt-4o-realtime-preview-2025-06-03"`

~~369~~

370 - `"gpt-4o-mini-realtime-preview"`

~~371~~

372 - `"gpt-4o-mini-realtime-preview-2024-12-17"`

~~373~~

374 - `"gpt-realtime-mini"`

~~375~~

376 - `"gpt-realtime-mini-2025-10-06"`

~~377~~

378 - `"gpt-realtime-mini-2025-12-15"`

~~379~~

380 - `"gpt-audio-1.5"`

~~381~~

382 - `"gpt-audio-mini"`

~~383~~

384 - `"gpt-audio-mini-2025-10-06"`

~~385~~

386 - `"gpt-audio-mini-2025-12-15"`

~~387~~

388 - `output_modalities: optional array of "text" or "audio"`

~~389~~

390 The set of modalities the model can respond with. It defaults to `["audio"]`, indicating

391 that the model will respond with audio plus a transcript. `["text"]` can be used to make

392 the model respond with text only. It is not possible to request both `text` and `audio` at the same time.

~~393~~

394 - `"text"`

~~395~~

396 - `"audio"`

~~397~~

398 - `prompt: optional object { id, variables, version }`

~~399~~

400 Reference to a prompt template and its variables.

401 [Learn more](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts).

~~402~~

403 - `id: string`

~~404~~

405 The unique identifier of the prompt template to use.

~~406~~

407 - `variables: optional map[string or ResponseInputText or ResponseInputImage or ResponseInputFile]`

~~408~~

409 Optional map of values to substitute in for variables in your

410 prompt. The substitution values can either be strings, or other

411 Response input types like images or files.

~~412~~

413 - `union_member_0: string`

~~414~~

415 - `response_input_text: object { text, type }`

~~416~~

417 A text input to the model.

~~418~~

419 - `text: string`

~~420~~

421 The text input to the model.

~~422~~

423 - `type: "input_text"`

~~424~~

425 The type of the input item. Always `input_text`.

~~426~~

427 - `response_input_image: object { detail, type, file_id, image_url }`

~~428~~

429 An image input to the model. Learn about [image inputs](https://platform.openai.com/docs/guides/vision).

~~430~~

431 - `detail: "low" or "high" or "auto" or "original"`

~~432~~

433 The detail level of the image to be sent to the model. One of `high`, `low`, `auto`, or `original`. Defaults to `auto`.

~~434~~

435 - `"low"`

~~436~~

437 - `"high"`

~~438~~

439 - `"auto"`

~~440~~

441 - `"original"`

~~442~~

443 - `type: "input_image"`

~~444~~

445 The type of the input item. Always `input_image`.

~~446~~

447 - `file_id: optional string`

~~448~~

449 The ID of the file to be sent to the model.

~~450~~

451 - `image_url: optional string`

~~452~~

453 The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.

~~454~~

455 - `response_input_file: object { type, detail, file_data, 3 more }`

~~456~~

457 A file input to the model.

~~458~~

459 - `type: "input_file"`

~~460~~

461 The type of the input item. Always `input_file`.

~~462~~

463 - `detail: optional "low" or "high"`

~~464~~

465 The detail level of the file to be sent to the model. Use `low` for the default rendering behavior, or `high` to render the file at higher quality. Defaults to `low`.

~~466~~

467 - `"low"`

~~468~~

469 - `"high"`

~~470~~

471 - `file_data: optional string`

~~472~~

473 The content of the file to be sent to the model.

~~474~~

475 - `file_id: optional string`

~~476~~

477 The ID of the file to be sent to the model.

~~478~~

479 - `file_url: optional string`

~~480~~

481 The URL of the file to be sent to the model.

~~482~~

483 - `filename: optional string`

~~484~~

485 The name of the file to be sent to the model.

~~486~~

487 - `version: optional string`

~~488~~

489 Optional version of the prompt template.

~~490~~

491 - `reasoning: optional object { effort }`

~~492~~

493 Configuration for reasoning-capable Realtime models such as `gpt-realtime-2`.

~~494~~

495 - `effort: optional "minimal" or "low" or "medium" or 2 more`

~~496~~

497 Constrains effort on reasoning for reasoning-capable Realtime models such as

498 `gpt-realtime-2`.

~~499~~

500 - `"minimal"`

~~501~~

502 - `"low"`

~~503~~

504 - `"medium"`

~~505~~

506 - `"high"`

~~507~~

508 - `"xhigh"`

~~509~~

510 - `tool_choice: optional ToolChoiceOptions or ToolChoiceFunction or ToolChoiceMcp`

~~511~~

512 How the model chooses tools. Provide one of the string modes or force a specific

513 function/MCP tool.

~~514~~

515 - `tool_choice_options: "none" or "auto" or "required"`

~~516~~

517 Controls which (if any) tool is called by the model.

~~518~~

519 `none` means the model will not call any tool and instead generates a message.

~~520~~

521 `auto` means the model can pick between generating a message or calling one or

522 more tools.

~~523~~

524 `required` means the model must call one or more tools.

~~525~~

526 - `"none"`

~~527~~

528 - `"auto"`

~~529~~

530 - `"required"`

~~531~~

532 - `tool_choice_function: object { name, type }`

~~533~~

534 Use this option to force the model to call a specific function.

~~535~~

536 - `name: string`

~~537~~

538 The name of the function to call.

~~539~~

540 - `type: "function"`

~~541~~

542 For function calling, the type is always `function`.

~~543~~

544 - `tool_choice_mcp: object { server_label, type, name }`

~~545~~

546 Use this option to force the model to call a specific tool on a remote MCP server.

~~547~~

548 - `server_label: string`

~~549~~

550 The label of the MCP server to use.

~~551~~

552 - `type: "mcp"`

~~553~~

554 For MCP tools, the type is always `mcp`.

~~555~~

556 - `name: optional string`

~~557~~

558 The name of the tool to call on the server.

~~559~~

560 - `tools: optional array of RealtimeFunctionTool or object { server_label, type, allowed_tools, 7 more }`

~~561~~

562 Tools available to the model.

~~563~~

564 - `realtime_function_tool: object { description, name, parameters, type }`

~~565~~

566 - `description: optional string`

~~567~~

568 The description of the function, including guidance on when and how

569 to call it, and guidance about what to tell the user when calling

570 (if anything).

~~571~~

572 - `name: optional string`

~~573~~

574 The name of the function.

~~575~~

576 - `parameters: optional unknown`

~~577~~

578 Parameters of the function in JSON Schema.

~~579~~

580 - `type: optional "function"`

~~581~~

582 The type of the tool, i.e. `function`.

~~583~~

584 - `"function"`

~~585~~

586 - `MCP tool: object { server_label, type, allowed_tools, 7 more }`

~~587~~

588 Give the model access to additional tools via remote Model Context Protocol

589 (MCP) servers. [Learn more about MCP](https://platform.openai.com/docs/guides/tools-remote-mcp).

~~590~~

591 - `server_label: string`

~~592~~

593 A label for this MCP server, used to identify it in tool calls.

~~594~~

595 - `type: "mcp"`

~~596~~

597 The type of the MCP tool. Always `mcp`.

~~598~~

599 - `allowed_tools: optional array of string or object { read_only, tool_names }`

~~600~~

601 List of allowed tool names or a filter object.

~~602~~

603 - `MCP allowed tools: array of string`

~~604~~

605 A string array of allowed tool names

~~606~~

607 - `MCP tool filter: object { read_only, tool_names }`

~~608~~

609 A filter object to specify which tools are allowed.

~~610~~

611 - `read_only: optional boolean`

~~612~~

613 Indicates whether or not a tool modifies data or is read-only. If an

614 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

615 it will match this filter.

~~616~~

617 - `tool_names: optional array of string`

~~618~~

619 List of allowed tool names.

~~620~~

621 - `authorization: optional string`

~~622~~

623 An OAuth access token that can be used with a remote MCP server, either

624 with a custom MCP server URL or a service connector. Your application

625 must handle the OAuth authorization flow and provide the token here.

~~626~~

627 - `connector_id: optional "connector_dropbox" or "connector_gmail" or "connector_googlecalendar" or 5 more`

~~628~~

629 Identifier for service connectors, like those available in ChatGPT. One of

630 `server_url` or `connector_id` must be provided. Learn more about service

631 connectors [here](https://platform.openai.com/docs/guides/tools-remote-mcp#connectors).

~~632~~

633 Currently supported `connector_id` values are:

~~634~~

635 - Dropbox: `connector_dropbox`

636 - Gmail: `connector_gmail`

637 - Google Calendar: `connector_googlecalendar`

638 - Google Drive: `connector_googledrive`

639 - Microsoft Teams: `connector_microsoftteams`

640 - Outlook Calendar: `connector_outlookcalendar`

641 - Outlook Email: `connector_outlookemail`

642 - SharePoint: `connector_sharepoint`

~~643~~

644 - `"connector_dropbox"`

~~645~~

646 - `"connector_gmail"`

~~647~~

648 - `"connector_googlecalendar"`

~~649~~

650 - `"connector_googledrive"`

~~651~~

652 - `"connector_microsoftteams"`

~~653~~

654 - `"connector_outlookcalendar"`

~~655~~

656 - `"connector_outlookemail"`

~~657~~

658 - `"connector_sharepoint"`

~~659~~

660 - `defer_loading: optional boolean`

~~661~~

662 Whether this MCP tool is deferred and discovered via tool search.

~~663~~

664 - `headers: optional map[string]`

~~665~~

666 Optional HTTP headers to send to the MCP server. Use for authentication

667 or other purposes.

~~668~~

669 - `require_approval: optional object { always, never } or "always" or "never"`

~~670~~

671 Specify which of the MCP server's tools require approval.

~~672~~

673 - `MCP tool approval filter: object { always, never }`

~~674~~

675 Specify which of the MCP server's tools require approval. Can be

676 `always`, `never`, or a filter object associated with tools

677 that require approval.

~~678~~

679 - `always: optional object { read_only, tool_names }`

~~680~~

681 A filter object to specify which tools are allowed.

~~682~~

683 - `read_only: optional boolean`

~~684~~

685 Indicates whether or not a tool modifies data or is read-only. If an

686 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

687 it will match this filter.

~~688~~

689 - `tool_names: optional array of string`

~~690~~

691 List of allowed tool names.

~~692~~

693 - `never: optional object { read_only, tool_names }`

~~694~~

695 A filter object to specify which tools are allowed.

~~696~~

697 - `read_only: optional boolean`

~~698~~

699 Indicates whether or not a tool modifies data or is read-only. If an

700 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

701 it will match this filter.

~~702~~

703 - `tool_names: optional array of string`

~~704~~

705 List of allowed tool names.

~~706~~

707 - `MCP tool approval setting: "always" or "never"`

~~708~~

709 Specify a single approval policy for all tools. One of `always` or

710 `never`. When set to `always`, all tools will require approval. When

711 set to `never`, all tools will not require approval.

~~712~~

713 - `"always"`

~~714~~

715 - `"never"`

~~716~~

717 - `server_description: optional string`

~~718~~

719 Optional description of the MCP server, used to provide more context.

~~720~~

721 - `server_url: optional string`

~~722~~

723 The URL for the MCP server. One of `server_url` or `connector_id` must be

724 provided.

~~725~~

726 - `tracing: optional "auto" or object { group_id, metadata, workflow_name }`

~~727~~

728 Realtime API can write session traces to the [Traces Dashboard](https://platform.openai.com/logs?api=traces). Set to null to disable tracing. Once

729 tracing is enabled for a session, the configuration cannot be modified.

~~730~~

731 `auto` will create a trace for the session with default values for the

732 workflow name, group id, and metadata.

~~733~~

734 - `auto: "auto"`

~~735~~

736 Enables tracing and sets default values for tracing configuration options. Always `auto`.

~~737~~

738 - `Tracing Configuration: object { group_id, metadata, workflow_name }`

~~739~~

740 Granular configuration for tracing.

~~741~~

742 - `group_id: optional string`

~~743~~

744 The group id to attach to this trace to enable filtering and

745 grouping in the Traces Dashboard.

~~746~~

747 - `metadata: optional unknown`

~~748~~

749 The arbitrary metadata to attach to this trace to enable

750 filtering in the Traces Dashboard.

~~751~~

752 - `workflow_name: optional string`

~~753~~

754 The name of the workflow to attach to this trace. This is used to

755 name the trace in the Traces Dashboard.

~~756~~

757 - `truncation: optional "auto" or "disabled" or RealtimeTruncationRetentionRatio`

~~758~~

759 When the number of tokens in a conversation exceeds the model's input token limit, the conversation be truncated, meaning messages (starting from the oldest) will not be included in the model's context. A 32k context model with 4,096 max output tokens can only include 28,224 tokens in the context before truncation occurs.

~~760~~

761 Clients can configure truncation behavior to truncate with a lower max token limit, which is an effective way to control token usage and cost.

~~762~~

763 Truncation will reduce the number of cached tokens on the next turn (busting the cache), since messages are dropped from the beginning of the context. However, clients can also configure truncation to retain messages up to a fraction of the maximum context size, which will reduce the need for future truncations and thus improve the cache rate.

~~764~~

765 Truncation can be disabled entirely, which means the server will never truncate but would instead return an error if the conversation exceeds the model's input token limit.

~~766~~

767 - `RealtimeTruncationStrategy: "auto" or "disabled"`

~~768~~

769 The truncation strategy to use for the session. `auto` is the default truncation strategy. `disabled` will disable truncation and emit errors when the conversation exceeds the input token limit.

~~770~~

771 - `"auto"`

~~772~~

773 - `"disabled"`

~~774~~

775 - `realtime_truncation_retention_ratio: object { retention_ratio, type, token_limits }`

~~776~~

777 Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.

~~778~~

779 - `retention_ratio: number`

~~780~~

781 Fraction of post-instruction conversation tokens to retain (`0.0` - `1.0`) when the conversation exceeds the input token limit. Setting this to `0.8` means that messages will be dropped until 80% of the maximum allowed tokens are used. This helps reduce the frequency of truncations and improve cache rates.

~~782~~

783 - `type: "retention_ratio"`

~~784~~

785 Use retention ratio truncation.

~~786~~

787 - `token_limits: optional object { post_instructions }`

~~788~~

789 Optional custom token limits for this truncation strategy. If not provided, the model's default token limits will be used.

~~790~~

791 - `post_instructions: optional number`

~~792~~

793 Maximum tokens allowed in the conversation after instructions (which including tool definitions). For example, setting this to 5,000 would mean that truncation would occur when the conversation exceeds 5,000 tokens after instructions. This cannot be higher than the model's context window size minus the maximum output tokens.

~~794~~

795 - `realtime_transcription_session_create_response: object { id, object, type, 3 more }`

~~796~~

797 A Realtime transcription session configuration object.

~~798~~

799 - `id: string`

~~800~~

801 Unique identifier for the session that looks like `sess_1234567890abcdef`.

~~802~~

803 - `object: string`

~~804~~

805 The object type. Always `realtime.transcription_session`.

~~806~~

807 - `type: "transcription"`

~~808~~

809 The type of session. Always `transcription` for transcription sessions.

~~810~~

811 - `audio: optional object { input }`

~~812~~

813 Configuration for input audio for the session.

~~814~~

815 - `input: optional object { format, noise_reduction, transcription, turn_detection }`

~~816~~

817 - `format: optional object { rate, type } or object { type } or object { type }`

~~818~~

819 The PCM audio format. Only a 24kHz sample rate is supported.

~~820~~

821 - `audio/pcm: object { rate, type }`

~~822~~

823 The PCM audio format. Only a 24kHz sample rate is supported.

~~824~~

825 - `audio/pcmu: object { type }`

~~826~~

827 The G.711 μ-law format.

~~828~~

829 - `audio/pcma: object { type }`

~~830~~

831 The G.711 A-law format.

~~832~~

833 - `noise_reduction: optional object { type }`

~~834~~

835 Configuration for input audio noise reduction.

~~836~~

837 - `type: optional "near_field" or "far_field"`

~~838~~

839 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.

~~840~~

841 - `"near_field"`

~~842~~

843 - `"far_field"`

~~844~~

845 - `transcription: optional object { delay, language, model, prompt }`

~~846~~

847 - `delay: optional "minimal" or "low" or "medium" or 2 more`

~~848~~

849 Controls how long the model waits before emitting transcription text.

850 Higher values can improve transcription accuracy at the cost of latency.

851 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~852~~

853 - `language: optional string`

~~854~~

855 The language of the input audio. Supplying the input language in

856 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format

857 will improve accuracy and latency.

~~858~~

859 - `model: optional string or "whisper-1" or "gpt-4o-mini-transcribe" or "gpt-4o-mini-transcribe-2025-12-15" or 3 more`

~~860~~

861 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~862~~

863 - `prompt: optional string`

~~864~~

865 An optional text to guide the model's style or continue a previous audio

866 segment.

867 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).

868 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".

869 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~870~~

871 - `turn_detection: optional object { prefix_padding_ms, silence_duration_ms, threshold, type }`

~~872~~

873 Configuration for turn detection. Can be set to `null` to turn off. Server

874 VAD means that the model will detect the start and end of speech based on

875 audio volume and respond at the end of user speech. For `gpt-realtime-whisper`, this must be `null`; VAD is not supported.

~~876~~

877 - `prefix_padding_ms: optional number`

~~878~~

879 Amount of audio to include before the VAD detected speech (in

880 milliseconds). Defaults to 300ms.

~~881~~

882 - `silence_duration_ms: optional number`

~~883~~

884 Duration of silence to detect speech stop (in milliseconds). Defaults

885 to 500ms. With shorter values the model will respond more quickly,

886 but may jump in on short pauses from the user.

~~887~~

888 - `threshold: optional number`

~~889~~

890 Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

891 higher threshold will require louder audio to activate the model, and

892 thus might perform better in noisy environments.

~~893~~

894 - `type: optional string`

~~895~~

896 Type of turn detection, only `server_vad` is currently supported.

~~897~~

898 - `expires_at: optional number`

~~899~~

900 Expiration timestamp for the session, in seconds since epoch.

~~901~~

902 - `include: optional array of "item.input_audio_transcription.logprobs"`

~~903~~

904 Additional fields to include in server outputs.

~~905~~

906 - `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~907~~

908 - `"item.input_audio_transcription.logprobs"`

~~909~~

910 - `value: string`

~~911~~

912 The generated client secret value.

~~913~~

914### Example

~~915~~

916```cli

917openai realtime:client-secrets create \

918 --api-key 'My API Key'

919```

~~920~~

921#### Response

~~922~~

923```json

924{

925 "expires_at": 0,

926 "session": {

927 "id": "id",

928 "object": "realtime.session",

929 "type": "realtime",

930 "audio": {

931 "input": {

932 "format": {

933 "rate": 24000,

934 "type": "audio/pcm"

935 },

936 "noise_reduction": {

937 "type": "near_field"

938 },

939 "transcription": {

940 "delay": "minimal",

941 "language": "language",

942 "model": "string",

943 "prompt": "prompt"

944 },

945 "turn_detection": {

946 "type": "server_vad",

947 "create_response": true,

948 "idle_timeout_ms": 5000,

949 "interrupt_response": true,

950 "prefix_padding_ms": 0,

951 "silence_duration_ms": 0,

952 "threshold": 0

953 }

954 },

955 "output": {

956 "format": {

957 "rate": 24000,

958 "type": "audio/pcm"

959 },

960 "speed": 0.25,

961 "voice": "ash"

962 }

963 },

964 "expires_at": 0,

965 "include": [

966 "item.input_audio_transcription.logprobs"

967 ],

968 "instructions": "instructions",

969 "max_output_tokens": 0,

970 "model": "string",

971 "output_modalities": [

972 "text"

973 ],

974 "prompt": {

975 "id": "id",

976 "variables": {

977 "foo": "string"

978 },

979 "version": "version"

980 },

981 "reasoning": {

982 "effort": "minimal"

983 },

984 "tool_choice": "none",

985 "tools": [

986 {

987 "description": "description",

988 "name": "name",

989 "parameters": {},

990 "type": "function"

991 }

992 ],

993 "tracing": "auto",

994 "truncation": "auto"

995 },

996 "value": "value"

997}

998```

~~999~~

1000## Domain Types

~~1001~~

1002### Realtime Session Create Response

~~1003~~

1004- `realtime_session_create_response: object { id, object, type, 13 more }`

~~1005~~

1006 A Realtime session configuration object.

~~1007~~

1008 - `id: string`

~~1009~~

1010 Unique identifier for the session that looks like `sess_1234567890abcdef`.

~~1011~~

1012 - `object: "realtime.session"`

~~1013~~

1014 The object type. Always `realtime.session`.

~~1015~~

1016 - `type: "realtime"`

~~1017~~

1018 The type of session to create. Always `realtime` for the Realtime API.

~~1019~~

1020 - `audio: optional object { input, output }`

~~1021~~

1022 Configuration for input and output audio.

~~1023~~

1024 - `input: optional object { format, noise_reduction, transcription, turn_detection }`

~~1025~~

1026 - `format: optional object { rate, type } or object { type } or object { type }`

~~1027~~

1028 The format of the input audio.

~~1029~~

1030 - `audio/pcm: object { rate, type }`

~~1031~~

1032 The PCM audio format. Only a 24kHz sample rate is supported.

~~1033~~

1034 - `rate: optional 24000`

~~1035~~

1036 The sample rate of the audio. Always `24000`.

~~1037~~

1038 - `24000`

~~1039~~

1040 - `type: optional "audio/pcm"`

~~1041~~

1042 The audio format. Always `audio/pcm`.

~~1043~~

1044 - `"audio/pcm"`

~~1045~~

1046 - `audio/pcmu: object { type }`

~~1047~~

1048 The G.711 μ-law format.

~~1049~~

1050 - `type: optional "audio/pcmu"`

~~1051~~

1052 The audio format. Always `audio/pcmu`.

~~1053~~

1054 - `"audio/pcmu"`

~~1055~~

1056 - `audio/pcma: object { type }`

~~1057~~

1058 The G.711 A-law format.

~~1059~~

1060 - `type: optional "audio/pcma"`

~~1061~~

1062 The audio format. Always `audio/pcma`.

~~1063~~

1064 - `"audio/pcma"`

~~1065~~

1066 - `noise_reduction: optional object { type }`

~~1067~~

1068 Configuration for input audio noise reduction. This can be set to `null` to turn off.

1069 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.

1070 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.

~~1071~~

1072 - `type: optional "near_field" or "far_field"`

~~1073~~

1074 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.

~~1075~~

1076 - `"near_field"`

~~1077~~

1078 - `"far_field"`

~~1079~~

1080 - `transcription: optional object { delay, language, model, prompt }`

~~1081~~

1082 - `delay: optional "minimal" or "low" or "medium" or 2 more`

~~1083~~

1084 Controls how long the model waits before emitting transcription text.

1085 Higher values can improve transcription accuracy at the cost of latency.

1086 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~1087~~

1088 - `"minimal"`

~~1089~~

1090 - `"low"`

~~1091~~

1092 - `"medium"`

~~1093~~

1094 - `"high"`

~~1095~~

1096 - `"xhigh"`

~~1097~~

1098 - `language: optional string`

~~1099~~

1100 The language of the input audio. Supplying the input language in

1101 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format

1102 will improve accuracy and latency.

~~1103~~

1104 - `model: optional string or "whisper-1" or "gpt-4o-mini-transcribe" or "gpt-4o-mini-transcribe-2025-12-15" or 3 more`

~~1105~~

1106 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~1107~~

1108 - `"whisper-1"`

~~1109~~

1110 - `"gpt-4o-mini-transcribe"`

~~1111~~

1112 - `"gpt-4o-mini-transcribe-2025-12-15"`

~~1113~~

1114 - `"gpt-4o-transcribe"`

~~1115~~

1116 - `"gpt-4o-transcribe-diarize"`

~~1117~~

1118 - `"gpt-realtime-whisper"`

~~1119~~

1120 - `prompt: optional string`

~~1121~~

1122 An optional text to guide the model's style or continue a previous audio

1123 segment.

1124 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).

1125 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".

1126 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~1127~~

1128 - `turn_detection: optional object { type, create_response, idle_timeout_ms, 4 more } or object { type, create_response, eagerness, interrupt_response }`

~~1129~~

1130 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.

~~1131~~

1132 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.

~~1133~~

1134 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.

~~1135~~

1136 For `gpt-realtime-whisper` transcription sessions, turn detection must be

1137 set to `null`; VAD is not supported.

~~1138~~

1139 - `server_vad: object { type, create_response, idle_timeout_ms, 4 more }`

~~1140~~

1141 Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.

~~1142~~

1143 - `type: "server_vad"`

~~1144~~

1145 Type of turn detection, `server_vad` to turn on simple Server VAD.

~~1146~~

1147 - `create_response: optional boolean`

~~1148~~

1149 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.

~~1150~~

1151 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~1152~~

1153 - `idle_timeout_ms: optional number`

~~1154~~

1155 Optional timeout after which a model response will be triggered automatically. This is

1156 useful for situations in which a long pause from the user is unexpected, such as a phone

1157 call. The model will effectively prompt the user to continue the conversation based

1158 on the current context.

~~1159~~

1160 The timeout value will be applied after the last model response's audio has finished playing,

1161 i.e. it's set to the `response.done` time plus audio playback duration.

~~1162~~

1163 An `input_audio_buffer.timeout_triggered` event (plus events

1164 associated with the Response) will be emitted when the timeout is reached.

1165 Idle timeout is currently only supported for `server_vad` mode.

~~1166~~

1167 - `interrupt_response: optional boolean`

~~1168~~

1169 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default

1170 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.

~~1171~~

1172 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~1173~~

1174 - `prefix_padding_ms: optional number`

~~1175~~

1176 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in

1177 milliseconds). Defaults to 300ms.

~~1178~~

1179 - `silence_duration_ms: optional number`

~~1180~~

1181 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults

1182 to 500ms. With shorter values the model will respond more quickly,

1183 but may jump in on short pauses from the user.

~~1184~~

1185 - `threshold: optional number`

~~1186~~

1187 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

1188 higher threshold will require louder audio to activate the model, and

1189 thus might perform better in noisy environments.

~~1190~~

1191 - `semantic_vad: object { type, create_response, eagerness, interrupt_response }`

~~1192~~

1193 Server-side semantic turn detection which uses a model to determine when the user has finished speaking.

~~1194~~

1195 - `type: "semantic_vad"`

~~1196~~

1197 Type of turn detection, `semantic_vad` to turn on Semantic VAD.

~~1198~~

1199 - `create_response: optional boolean`

~~1200~~

1201 Whether or not to automatically generate a response when a VAD stop event occurs.

~~1202~~

1203 - `eagerness: optional "low" or "medium" or "high" or "auto"`

~~1204~~

1205 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.

~~1206~~

1207 - `"low"`

~~1208~~

1209 - `"medium"`

~~1210~~

1211 - `"high"`

~~1212~~

1213 - `"auto"`

~~1214~~

1215 - `interrupt_response: optional boolean`

~~1216~~

1217 Whether or not to automatically interrupt any ongoing response with output to the default

1218 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.

~~1219~~

1220 - `output: optional object { format, speed, voice }`

~~1221~~

1222 - `format: optional object { rate, type } or object { type } or object { type }`

~~1223~~

1224 The format of the output audio.

~~1225~~

1226 - `audio/pcm: object { rate, type }`

~~1227~~

1228 The PCM audio format. Only a 24kHz sample rate is supported.

~~1229~~

1230 - `audio/pcmu: object { type }`

~~1231~~

1232 The G.711 μ-law format.

~~1233~~

1234 - `audio/pcma: object { type }`

~~1235~~

1236 The G.711 A-law format.

~~1237~~

1238 - `speed: optional number`

~~1239~~

1240 The speed of the model's spoken response as a multiple of the original speed.

1241 1.0 is the default speed. 0.25 is the minimum speed. 1.5 is the maximum speed. This value can only be changed in between model turns, not while a response is in progress.

~~1242~~

1243 This parameter is a post-processing adjustment to the audio after it is generated, it's

1244 also possible to prompt the model to speak faster or slower.

~~1245~~

1246 - `voice: optional string or "alloy" or "ash" or "ballad" or 7 more`

~~1247~~

1248 The voice the model uses to respond. Voice cannot be changed during the

1249 session once the model has responded with audio at least once. Current

1250 voice options are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`,

1251 `shimmer`, `verse`, `marin`, and `cedar`. We recommend `marin` and `cedar` for

1252 best quality.

~~1253~~

1254 - `"alloy"`

~~1255~~

1256 - `"ash"`

~~1257~~

1258 - `"ballad"`

~~1259~~

1260 - `"coral"`

~~1261~~

1262 - `"echo"`

~~1263~~

1264 - `"sage"`

~~1265~~

1266 - `"shimmer"`

~~1267~~

1268 - `"verse"`

~~1269~~

1270 - `"marin"`

~~1271~~

1272 - `"cedar"`

~~1273~~

1274 - `expires_at: optional number`

~~1275~~

1276 Expiration timestamp for the session, in seconds since epoch.

~~1277~~

1278 - `include: optional array of "item.input_audio_transcription.logprobs"`

~~1279~~

1280 Additional fields to include in server outputs.

~~1281~~

1282 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~1283~~

1284 - `"item.input_audio_transcription.logprobs"`

~~1285~~

1286 - `instructions: optional string`

~~1287~~

1288 The default system instructions (i.e. system message) prepended to model calls. This field allows the client to guide the model on desired responses. The model can be instructed on response content and format, (e.g. "be extremely succinct", "act friendly", "here are examples of good responses") and on audio behavior (e.g. "talk quickly", "inject emotion into your voice", "laugh frequently"). The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behavior.

~~1289~~

1290 Note that the server sets default instructions which will be used if this field is not set and are visible in the `session.created` event at the start of the session.

~~1291~~

1292 - `max_output_tokens: optional number or "inf"`

~~1293~~

1294 Maximum number of output tokens for a single assistant response,

1295 inclusive of tool calls. Provide an integer between 1 and 4096 to

1296 limit output tokens, or `inf` for the maximum available tokens for a

1297 given model. Defaults to `inf`.

~~1298~~

1299 - `union_member_0: number`

~~1300~~

1301 - `union_member_1: "inf"`

~~1302~~

1303 - `model: optional string or "gpt-realtime" or "gpt-realtime-1.5" or "gpt-realtime-2" or 14 more`

~~1304~~

1305 The Realtime model used for this session.

~~1306~~

1307 - `"gpt-realtime"`

~~1308~~

1309 - `"gpt-realtime-1.5"`

~~1310~~

1311 - `"gpt-realtime-2"`

~~1312~~

1313 - `"gpt-realtime-2025-08-28"`

~~1314~~

1315 - `"gpt-4o-realtime-preview"`

~~1316~~

1317 - `"gpt-4o-realtime-preview-2024-10-01"`

~~1318~~

1319 - `"gpt-4o-realtime-preview-2024-12-17"`

~~1320~~

1321 - `"gpt-4o-realtime-preview-2025-06-03"`

~~1322~~

1323 - `"gpt-4o-mini-realtime-preview"`

~~1324~~

1325 - `"gpt-4o-mini-realtime-preview-2024-12-17"`

~~1326~~

1327 - `"gpt-realtime-mini"`

~~1328~~

1329 - `"gpt-realtime-mini-2025-10-06"`

~~1330~~

1331 - `"gpt-realtime-mini-2025-12-15"`

~~1332~~

1333 - `"gpt-audio-1.5"`

~~1334~~

1335 - `"gpt-audio-mini"`

~~1336~~

1337 - `"gpt-audio-mini-2025-10-06"`

~~1338~~

1339 - `"gpt-audio-mini-2025-12-15"`

~~1340~~

1341 - `output_modalities: optional array of "text" or "audio"`

~~1342~~

1343 The set of modalities the model can respond with. It defaults to `["audio"]`, indicating

1344 that the model will respond with audio plus a transcript. `["text"]` can be used to make

1345 the model respond with text only. It is not possible to request both `text` and `audio` at the same time.

~~1346~~

1347 - `"text"`

~~1348~~

1349 - `"audio"`

~~1350~~

1351 - `prompt: optional object { id, variables, version }`

~~1352~~

1353 Reference to a prompt template and its variables.

1354 [Learn more](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts).

~~1355~~

1356 - `id: string`

~~1357~~

1358 The unique identifier of the prompt template to use.

~~1359~~

1360 - `variables: optional map[string or ResponseInputText or ResponseInputImage or ResponseInputFile]`

~~1361~~

1362 Optional map of values to substitute in for variables in your

1363 prompt. The substitution values can either be strings, or other

1364 Response input types like images or files.

~~1365~~

1366 - `union_member_0: string`

~~1367~~

1368 - `response_input_text: object { text, type }`

~~1369~~

1370 A text input to the model.

~~1371~~

1372 - `text: string`

~~1373~~

1374 The text input to the model.

~~1375~~

1376 - `type: "input_text"`

~~1377~~

1378 The type of the input item. Always `input_text`.

~~1379~~

1380 - `response_input_image: object { detail, type, file_id, image_url }`

~~1381~~

1382 An image input to the model. Learn about [image inputs](https://platform.openai.com/docs/guides/vision).

~~1383~~

1384 - `detail: "low" or "high" or "auto" or "original"`

~~1385~~

1386 The detail level of the image to be sent to the model. One of `high`, `low`, `auto`, or `original`. Defaults to `auto`.

~~1387~~

1388 - `"low"`

~~1389~~

1390 - `"high"`

~~1391~~

1392 - `"auto"`

~~1393~~

1394 - `"original"`

~~1395~~

1396 - `type: "input_image"`

~~1397~~

1398 The type of the input item. Always `input_image`.

~~1399~~

1400 - `file_id: optional string`

~~1401~~

1402 The ID of the file to be sent to the model.

~~1403~~

1404 - `image_url: optional string`

~~1405~~

1406 The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.

~~1407~~

1408 - `response_input_file: object { type, detail, file_data, 3 more }`

~~1409~~

1410 A file input to the model.

~~1411~~

1412 - `type: "input_file"`

~~1413~~

1414 The type of the input item. Always `input_file`.

~~1415~~

1416 - `detail: optional "low" or "high"`

~~1417~~

1418 The detail level of the file to be sent to the model. Use `low` for the default rendering behavior, or `high` to render the file at higher quality. Defaults to `low`.

~~1419~~

1420 - `"low"`

~~1421~~

1422 - `"high"`

~~1423~~

1424 - `file_data: optional string`

~~1425~~

1426 The content of the file to be sent to the model.

~~1427~~

1428 - `file_id: optional string`

~~1429~~

1430 The ID of the file to be sent to the model.

~~1431~~

1432 - `file_url: optional string`

~~1433~~

1434 The URL of the file to be sent to the model.

~~1435~~

1436 - `filename: optional string`

~~1437~~

1438 The name of the file to be sent to the model.

~~1439~~

1440 - `version: optional string`

~~1441~~

1442 Optional version of the prompt template.

~~1443~~

1444 - `reasoning: optional object { effort }`

~~1445~~

1446 Configuration for reasoning-capable Realtime models such as `gpt-realtime-2`.

~~1447~~

1448 - `effort: optional "minimal" or "low" or "medium" or 2 more`

~~1449~~

1450 Constrains effort on reasoning for reasoning-capable Realtime models such as

1451 `gpt-realtime-2`.

~~1452~~

1453 - `"minimal"`

~~1454~~

1455 - `"low"`

~~1456~~

1457 - `"medium"`

~~1458~~

1459 - `"high"`

~~1460~~

1461 - `"xhigh"`

~~1462~~

1463 - `tool_choice: optional ToolChoiceOptions or ToolChoiceFunction or ToolChoiceMcp`

~~1464~~

1465 How the model chooses tools. Provide one of the string modes or force a specific

1466 function/MCP tool.

~~1467~~

1468 - `tool_choice_options: "none" or "auto" or "required"`

~~1469~~

1470 Controls which (if any) tool is called by the model.

~~1471~~

1472 `none` means the model will not call any tool and instead generates a message.

~~1473~~

1474 `auto` means the model can pick between generating a message or calling one or

1475 more tools.

~~1476~~

1477 `required` means the model must call one or more tools.

~~1478~~

1479 - `"none"`

~~1480~~

1481 - `"auto"`

~~1482~~

1483 - `"required"`

~~1484~~

1485 - `tool_choice_function: object { name, type }`

~~1486~~

1487 Use this option to force the model to call a specific function.

~~1488~~

1489 - `name: string`

~~1490~~

1491 The name of the function to call.

~~1492~~

1493 - `type: "function"`

~~1494~~

1495 For function calling, the type is always `function`.

~~1496~~

1497 - `tool_choice_mcp: object { server_label, type, name }`

~~1498~~

1499 Use this option to force the model to call a specific tool on a remote MCP server.

~~1500~~

1501 - `server_label: string`

~~1502~~

1503 The label of the MCP server to use.

~~1504~~

1505 - `type: "mcp"`

~~1506~~

1507 For MCP tools, the type is always `mcp`.

~~1508~~

1509 - `name: optional string`

~~1510~~

1511 The name of the tool to call on the server.

~~1512~~

1513 - `tools: optional array of RealtimeFunctionTool or object { server_label, type, allowed_tools, 7 more }`

~~1514~~

1515 Tools available to the model.

~~1516~~

1517 - `realtime_function_tool: object { description, name, parameters, type }`

~~1518~~

1519 - `description: optional string`

~~1520~~

1521 The description of the function, including guidance on when and how

1522 to call it, and guidance about what to tell the user when calling

1523 (if anything).

~~1524~~

1525 - `name: optional string`

~~1526~~

1527 The name of the function.

~~1528~~

1529 - `parameters: optional unknown`

~~1530~~

1531 Parameters of the function in JSON Schema.

~~1532~~

1533 - `type: optional "function"`

~~1534~~

1535 The type of the tool, i.e. `function`.

~~1536~~

1537 - `"function"`

~~1538~~

1539 - `MCP tool: object { server_label, type, allowed_tools, 7 more }`

~~1540~~

1541 Give the model access to additional tools via remote Model Context Protocol

1542 (MCP) servers. [Learn more about MCP](https://platform.openai.com/docs/guides/tools-remote-mcp).

~~1543~~

1544 - `server_label: string`

~~1545~~

1546 A label for this MCP server, used to identify it in tool calls.

~~1547~~

1548 - `type: "mcp"`

~~1549~~

1550 The type of the MCP tool. Always `mcp`.

~~1551~~

1552 - `allowed_tools: optional array of string or object { read_only, tool_names }`

~~1553~~

1554 List of allowed tool names or a filter object.

~~1555~~

1556 - `MCP allowed tools: array of string`

~~1557~~

1558 A string array of allowed tool names

~~1559~~

1560 - `MCP tool filter: object { read_only, tool_names }`

~~1561~~

1562 A filter object to specify which tools are allowed.

~~1563~~

1564 - `read_only: optional boolean`

~~1565~~

1566 Indicates whether or not a tool modifies data or is read-only. If an

1567 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

1568 it will match this filter.

~~1569~~

1570 - `tool_names: optional array of string`

~~1571~~

1572 List of allowed tool names.

~~1573~~

1574 - `authorization: optional string`

~~1575~~

1576 An OAuth access token that can be used with a remote MCP server, either

1577 with a custom MCP server URL or a service connector. Your application

1578 must handle the OAuth authorization flow and provide the token here.

~~1579~~

1580 - `connector_id: optional "connector_dropbox" or "connector_gmail" or "connector_googlecalendar" or 5 more`

~~1581~~

1582 Identifier for service connectors, like those available in ChatGPT. One of

1583 `server_url` or `connector_id` must be provided. Learn more about service

1584 connectors [here](https://platform.openai.com/docs/guides/tools-remote-mcp#connectors).

~~1585~~

1586 Currently supported `connector_id` values are:

~~1587~~

1588 - Dropbox: `connector_dropbox`

1589 - Gmail: `connector_gmail`

1590 - Google Calendar: `connector_googlecalendar`

1591 - Google Drive: `connector_googledrive`

1592 - Microsoft Teams: `connector_microsoftteams`

1593 - Outlook Calendar: `connector_outlookcalendar`

1594 - Outlook Email: `connector_outlookemail`

1595 - SharePoint: `connector_sharepoint`

~~1596~~

1597 - `"connector_dropbox"`

~~1598~~

1599 - `"connector_gmail"`

~~1600~~

1601 - `"connector_googlecalendar"`

~~1602~~

1603 - `"connector_googledrive"`

~~1604~~

1605 - `"connector_microsoftteams"`

~~1606~~

1607 - `"connector_outlookcalendar"`

~~1608~~

1609 - `"connector_outlookemail"`

~~1610~~

1611 - `"connector_sharepoint"`

~~1612~~

1613 - `defer_loading: optional boolean`

~~1614~~

1615 Whether this MCP tool is deferred and discovered via tool search.

~~1616~~

1617 - `headers: optional map[string]`

~~1618~~

1619 Optional HTTP headers to send to the MCP server. Use for authentication

1620 or other purposes.

~~1621~~

1622 - `require_approval: optional object { always, never } or "always" or "never"`

~~1623~~

1624 Specify which of the MCP server's tools require approval.

~~1625~~

1626 - `MCP tool approval filter: object { always, never }`

~~1627~~

1628 Specify which of the MCP server's tools require approval. Can be

1629 `always`, `never`, or a filter object associated with tools

1630 that require approval.

~~1631~~

1632 - `always: optional object { read_only, tool_names }`

~~1633~~

1634 A filter object to specify which tools are allowed.

~~1635~~

1636 - `read_only: optional boolean`

~~1637~~

1638 Indicates whether or not a tool modifies data or is read-only. If an

1639 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

1640 it will match this filter.

~~1641~~

1642 - `tool_names: optional array of string`

~~1643~~

1644 List of allowed tool names.

~~1645~~

1646 - `never: optional object { read_only, tool_names }`

~~1647~~

1648 A filter object to specify which tools are allowed.

~~1649~~

1650 - `read_only: optional boolean`

~~1651~~

1652 Indicates whether or not a tool modifies data or is read-only. If an

1653 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

1654 it will match this filter.

~~1655~~

1656 - `tool_names: optional array of string`

~~1657~~

1658 List of allowed tool names.

~~1659~~

1660 - `MCP tool approval setting: "always" or "never"`

~~1661~~

1662 Specify a single approval policy for all tools. One of `always` or

1663 `never`. When set to `always`, all tools will require approval. When

1664 set to `never`, all tools will not require approval.

~~1665~~

1666 - `"always"`

~~1667~~

1668 - `"never"`

~~1669~~

1670 - `server_description: optional string`

~~1671~~

1672 Optional description of the MCP server, used to provide more context.

~~1673~~

1674 - `server_url: optional string`

~~1675~~

1676 The URL for the MCP server. One of `server_url` or `connector_id` must be

1677 provided.

~~1678~~

1679 - `tracing: optional "auto" or object { group_id, metadata, workflow_name }`

~~1680~~

1681 Realtime API can write session traces to the [Traces Dashboard](https://platform.openai.com/logs?api=traces). Set to null to disable tracing. Once

1682 tracing is enabled for a session, the configuration cannot be modified.

~~1683~~

1684 `auto` will create a trace for the session with default values for the

1685 workflow name, group id, and metadata.

~~1686~~

1687 - `auto: "auto"`

~~1688~~

1689 Enables tracing and sets default values for tracing configuration options. Always `auto`.

~~1690~~

1691 - `Tracing Configuration: object { group_id, metadata, workflow_name }`

~~1692~~

1693 Granular configuration for tracing.

~~1694~~

1695 - `group_id: optional string`

~~1696~~

1697 The group id to attach to this trace to enable filtering and

1698 grouping in the Traces Dashboard.

~~1699~~

1700 - `metadata: optional unknown`

~~1701~~

1702 The arbitrary metadata to attach to this trace to enable

1703 filtering in the Traces Dashboard.

~~1704~~

1705 - `workflow_name: optional string`

~~1706~~

1707 The name of the workflow to attach to this trace. This is used to

1708 name the trace in the Traces Dashboard.

~~1709~~

1710 - `truncation: optional "auto" or "disabled" or RealtimeTruncationRetentionRatio`

~~1711~~

1712 When the number of tokens in a conversation exceeds the model's input token limit, the conversation be truncated, meaning messages (starting from the oldest) will not be included in the model's context. A 32k context model with 4,096 max output tokens can only include 28,224 tokens in the context before truncation occurs.

~~1713~~

1714 Clients can configure truncation behavior to truncate with a lower max token limit, which is an effective way to control token usage and cost.

~~1715~~

1716 Truncation will reduce the number of cached tokens on the next turn (busting the cache), since messages are dropped from the beginning of the context. However, clients can also configure truncation to retain messages up to a fraction of the maximum context size, which will reduce the need for future truncations and thus improve the cache rate.

~~1717~~

1718 Truncation can be disabled entirely, which means the server will never truncate but would instead return an error if the conversation exceeds the model's input token limit.

~~1719~~

1720 - `RealtimeTruncationStrategy: "auto" or "disabled"`

~~1721~~

1722 The truncation strategy to use for the session. `auto` is the default truncation strategy. `disabled` will disable truncation and emit errors when the conversation exceeds the input token limit.

~~1723~~

1724 - `"auto"`

~~1725~~

1726 - `"disabled"`

~~1727~~

1728 - `realtime_truncation_retention_ratio: object { retention_ratio, type, token_limits }`

~~1729~~

1730 Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.

~~1731~~

1732 - `retention_ratio: number`

~~1733~~

1734 Fraction of post-instruction conversation tokens to retain (`0.0` - `1.0`) when the conversation exceeds the input token limit. Setting this to `0.8` means that messages will be dropped until 80% of the maximum allowed tokens are used. This helps reduce the frequency of truncations and improve cache rates.

~~1735~~

1736 - `type: "retention_ratio"`

~~1737~~

1738 Use retention ratio truncation.

~~1739~~

1740 - `token_limits: optional object { post_instructions }`

~~1741~~

1742 Optional custom token limits for this truncation strategy. If not provided, the model's default token limits will be used.

~~1743~~

1744 - `post_instructions: optional number`

~~1745~~

1746 Maximum tokens allowed in the conversation after instructions (which including tool definitions). For example, setting this to 5,000 would mean that truncation would occur when the conversation exceeds 5,000 tokens after instructions. This cannot be higher than the model's context window size minus the maximum output tokens.

~~1747~~

1748### Realtime Transcription Session Create Response

~~1749~~

1750- `realtime_transcription_session_create_response: object { id, object, type, 3 more }`

~~1751~~

1752 A Realtime transcription session configuration object.

~~1753~~

1754 - `id: string`

~~1755~~

1756 Unique identifier for the session that looks like `sess_1234567890abcdef`.

~~1757~~

1758 - `object: string`

~~1759~~

1760 The object type. Always `realtime.transcription_session`.

~~1761~~

1762 - `type: "transcription"`

~~1763~~

1764 The type of session. Always `transcription` for transcription sessions.

~~1765~~

1766 - `audio: optional object { input }`

~~1767~~

1768 Configuration for input audio for the session.

~~1769~~

1770 - `input: optional object { format, noise_reduction, transcription, turn_detection }`

~~1771~~

1772 - `format: optional object { rate, type } or object { type } or object { type }`

~~1773~~

1774 The PCM audio format. Only a 24kHz sample rate is supported.

~~1775~~

1776 - `audio/pcm: object { rate, type }`

~~1777~~

1778 The PCM audio format. Only a 24kHz sample rate is supported.

~~1779~~

1780 - `rate: optional 24000`

~~1781~~

1782 The sample rate of the audio. Always `24000`.

~~1783~~

1784 - `24000`

~~1785~~

1786 - `type: optional "audio/pcm"`

~~1787~~

1788 The audio format. Always `audio/pcm`.

~~1789~~

1790 - `"audio/pcm"`

~~1791~~

1792 - `audio/pcmu: object { type }`

~~1793~~

1794 The G.711 μ-law format.

~~1795~~

1796 - `type: optional "audio/pcmu"`

~~1797~~

1798 The audio format. Always `audio/pcmu`.

~~1799~~

1800 - `"audio/pcmu"`

~~1801~~

1802 - `audio/pcma: object { type }`

~~1803~~

1804 The G.711 A-law format.

~~1805~~

1806 - `type: optional "audio/pcma"`

~~1807~~

1808 The audio format. Always `audio/pcma`.

~~1809~~

1810 - `"audio/pcma"`

~~1811~~

1812 - `noise_reduction: optional object { type }`

~~1813~~

1814 Configuration for input audio noise reduction.

~~1815~~

1816 - `type: optional "near_field" or "far_field"`

~~1817~~

1818 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.

~~1819~~

1820 - `"near_field"`

~~1821~~

1822 - `"far_field"`

~~1823~~

1824 - `transcription: optional object { delay, language, model, prompt }`

~~1825~~

1826 - `delay: optional "minimal" or "low" or "medium" or 2 more`

~~1827~~

1828 Controls how long the model waits before emitting transcription text.

1829 Higher values can improve transcription accuracy at the cost of latency.

1830 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~1831~~

1832 - `"minimal"`

~~1833~~

1834 - `"low"`

~~1835~~

1836 - `"medium"`

~~1837~~

1838 - `"high"`

~~1839~~

1840 - `"xhigh"`

~~1841~~

1842 - `language: optional string`

~~1843~~

1844 The language of the input audio. Supplying the input language in

1845 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format

1846 will improve accuracy and latency.

~~1847~~

1848 - `model: optional string or "whisper-1" or "gpt-4o-mini-transcribe" or "gpt-4o-mini-transcribe-2025-12-15" or 3 more`

~~1849~~

1850 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~1851~~

1852 - `"whisper-1"`

~~1853~~

1854 - `"gpt-4o-mini-transcribe"`

~~1855~~

1856 - `"gpt-4o-mini-transcribe-2025-12-15"`

~~1857~~

1858 - `"gpt-4o-transcribe"`

~~1859~~

1860 - `"gpt-4o-transcribe-diarize"`

~~1861~~

1862 - `"gpt-realtime-whisper"`

~~1863~~

1864 - `prompt: optional string`

~~1865~~

1866 An optional text to guide the model's style or continue a previous audio

1867 segment.

1868 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).

1869 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".

1870 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~1871~~

1872 - `turn_detection: optional object { prefix_padding_ms, silence_duration_ms, threshold, type }`

~~1873~~

1874 Configuration for turn detection. Can be set to `null` to turn off. Server

1875 VAD means that the model will detect the start and end of speech based on

1876 audio volume and respond at the end of user speech. For `gpt-realtime-whisper`, this must be `null`; VAD is not supported.

~~1877~~

1878 - `prefix_padding_ms: optional number`

~~1879~~

1880 Amount of audio to include before the VAD detected speech (in

1881 milliseconds). Defaults to 300ms.

~~1882~~

1883 - `silence_duration_ms: optional number`

~~1884~~

1885 Duration of silence to detect speech stop (in milliseconds). Defaults

1886 to 500ms. With shorter values the model will respond more quickly,

1887 but may jump in on short pauses from the user.

~~1888~~

1889 - `threshold: optional number`

~~1890~~

1891 Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

1892 higher threshold will require louder audio to activate the model, and

1893 thus might perform better in noisy environments.

~~1894~~

1895 - `type: optional string`

~~1896~~

1897 Type of turn detection, only `server_vad` is currently supported.

~~1898~~

1899 - `expires_at: optional number`

~~1900~~

1901 Expiration timestamp for the session, in seconds since epoch.

~~1902~~

1903 - `include: optional array of "item.input_audio_transcription.logprobs"`

~~1904~~

1905 Additional fields to include in server outputs.

~~1906~~

1907 - `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~1908~~

1909 - `"item.input_audio_transcription.logprobs"`

~~1910~~

1911### Realtime Transcription Session Turn Detection

~~1912~~

1913- `realtime_transcription_session_turn_detection: object { prefix_padding_ms, silence_duration_ms, threshold, type }`

~~1914~~

1915 Configuration for turn detection. Can be set to `null` to turn off. Server

1916 VAD means that the model will detect the start and end of speech based on

1917 audio volume and respond at the end of user speech. For `gpt-realtime-whisper`, this must be `null`; VAD is not supported.

~~1918~~

1919 - `prefix_padding_ms: optional number`

~~1920~~

1921 Amount of audio to include before the VAD detected speech (in

1922 milliseconds). Defaults to 300ms.

~~1923~~

1924 - `silence_duration_ms: optional number`

~~1925~~

1926 Duration of silence to detect speech stop (in milliseconds). Defaults

1927 to 500ms. With shorter values the model will respond more quickly,

1928 but may jump in on short pauses from the user.

~~1929~~

1930 - `threshold: optional number`

~~1931~~

1932 Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

1933 higher threshold will require louder audio to activate the model, and

1934 thus might perform better in noisy environments.

~~1935~~

1936 - `type: optional string`

~~1937~~

1938 Type of turn detection, only `server_vad` is currently supported.