Go Premium Account

Spybara
Companies
Openai
Api
Reference Changes, 2026-05-18 22:01 UTC to 2026-05-19 06:34 UTC
java/resources/realtime/subresources/client_secrets/index.md

java/resources/realtime/subresources/client_secrets/index.md 2026-05-18 22:01 UTC to 2026-05-19 06:34 UTC

0 added, 2803 removed.

2026

Wed 27 06:42 Fri 22 06:33 Wed 20 06:35 Tue 19 06:34 Mon 18 22:01 Mon 11 18:00 Thu 7 21:57 Tue 5 23:00 Sat 2 05:57

This document has no rendered page for this history range.

java/resources/realtime/subresources/client_secrets/index.md +0 −2803 deleted

File Deleted View Diff

~~1# Client Secrets~~

~~3## Create client secret~~

~~5`ClientSecretCreateResponse realtime().clientSecrets().create(ClientSecretCreateParamsparams = ClientSecretCreateParams.none(), RequestOptionsrequestOptions = RequestOptions.none())`~~

~~7**post** `/realtime/client_secrets`~~

~~9Create a Realtime client secret with an associated session configuration.~~

~~11Client secrets are short-lived tokens that can be passed to a client app,~~

~~12such as a web frontend or mobile client, which grants access to the Realtime API without~~

~~13leaking your main API key. You can configure a custom TTL for each client secret.~~

~~15You can also attach session configuration options to the client secret, which will be~~

~~16applied to any sessions created using that client secret, but these can also be overridden~~

~~17by the client connection.~~

~~19[Learn more about authentication with client secrets over WebRTC](https://platform.openai.com/docs/guides/realtime-webrtc).~~

~~21Returns the created client secret and the effective session object. The client secret is a string that looks like `ek_1234`.~~

~~23### Parameters~~

~~25- `ClientSecretCreateParams params`~~

~~27 - `Optional<ExpiresAfter> expiresAfter`~~

~~29 Configuration for the client secret expiration. Expiration refers to the time after which~~

~~30 a client secret will no longer be valid for creating sessions. The session itself may~~

~~31 continue after that time once started. A secret can be used to create multiple sessions~~

~~32 until it expires.~~

~~34 - `Optional<Anchor> anchor`~~

36 The anchor point for the client secret expiration, meaning that `seconds` will be added to the `created_at` time of the client secret to produce an expiration timestamp. Only `created_at` is currently supported.

~~38 - `CREATED_AT("created_at")`~~

~~40 - `Optional<Long> seconds`~~

~~42 The number of seconds from the anchor point to the expiration. Select a value between `10` and `7200` (2 hours). This default to 600 seconds (10 minutes) if not specified.~~

~~44 - `Optional<Session> session`~~

~~46 Session configuration to use for the client secret. Choose either a realtime~~

~~47 session or a transcription session.~~

~~49 - `class RealtimeSessionCreateRequest:`~~

~~51 Realtime session object configuration.~~

~~53 - `JsonValue; type "realtime"constant`~~

~~55 The type of session to create. Always `realtime` for the Realtime API.~~

~~57 - `REALTIME("realtime")`~~

~~59 - `Optional<RealtimeAudioConfig> audio`~~

~~61 Configuration for input and output audio.~~

~~63 - `Optional<RealtimeAudioConfigInput> input`~~

~~65 - `Optional<RealtimeAudioFormats> format`~~

~~67 The format of the input audio.~~

~~69 - `AudioPcm`~~

~~71 - `Optional<Rate> rate`~~

~~73 The sample rate of the audio. Always `24000`.~~

~~75 - `_24000(24000)`~~

~~77 - `Optional<Type> type`~~

~~79 The audio format. Always `audio/pcm`.~~

~~81 - `AUDIO_PCM("audio/pcm")`~~

~~83 - `AudioPcmu`~~

~~85 - `Optional<Type> type`~~

~~87 The audio format. Always `audio/pcmu`.~~

~~89 - `AUDIO_PCMU("audio/pcmu")`~~

~~91 - `AudioPcma`~~

~~93 - `Optional<Type> type`~~

~~95 The audio format. Always `audio/pcma`.~~

~~97 - `AUDIO_PCMA("audio/pcma")`~~

~~99 - `Optional<NoiseReduction> noiseReduction`~~

~~100~~

101 Configuration for input audio noise reduction. This can be set to `null` to turn off.

102 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.

103 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.

~~104~~

105 - `Optional<NoiseReductionType> type`

~~106~~

107 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.

~~108~~

109 - `NEAR_FIELD("near_field")`

~~110~~

111 - `FAR_FIELD("far_field")`

~~112~~

113 - `Optional<AudioTranscription> transcription`

~~114~~

115 Configuration for input audio transcription, defaults to off and can be set to `null` to turn off once on. Input audio transcription is not native to the model, since the model consumes audio directly. Transcription runs asynchronously through [the /audio/transcriptions endpoint](https://platform.openai.com/docs/api-reference/audio/createTranscription) and should be treated as guidance of input audio content rather than precisely what the model heard. The client can optionally set the language and prompt for transcription, these offer additional guidance to the transcription service.

~~116~~

117 - `Optional<Delay> delay`

~~118~~

119 Controls how long the model waits before emitting transcription text.

120 Higher values can improve transcription accuracy at the cost of latency.

121 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~122~~

123 - `MINIMAL("minimal")`

~~124~~

125 - `LOW("low")`

~~126~~

127 - `MEDIUM("medium")`

~~128~~

129 - `HIGH("high")`

~~130~~

131 - `XHIGH("xhigh")`

~~132~~

133 - `Optional<String> language`

~~134~~

135 The language of the input audio. Supplying the input language in

136 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format

137 will improve accuracy and latency.

~~138~~

139 - `Optional<Model> model`

~~140~~

141 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~142~~

143 - `WHISPER_1("whisper-1")`

~~144~~

145 - `GPT_4O_MINI_TRANSCRIBE("gpt-4o-mini-transcribe")`

~~146~~

147 - `GPT_4O_MINI_TRANSCRIBE_2025_12_15("gpt-4o-mini-transcribe-2025-12-15")`

~~148~~

149 - `GPT_4O_TRANSCRIBE("gpt-4o-transcribe")`

~~150~~

151 - `GPT_4O_TRANSCRIBE_DIARIZE("gpt-4o-transcribe-diarize")`

~~152~~

153 - `GPT_REALTIME_WHISPER("gpt-realtime-whisper")`

~~154~~

155 - `Optional<String> prompt`

~~156~~

157 An optional text to guide the model's style or continue a previous audio

158 segment.

159 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).

160 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".

161 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~162~~

163 - `Optional<RealtimeAudioInputTurnDetection> turnDetection`

~~164~~

165 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.

~~166~~

167 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.

~~168~~

169 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.

~~170~~

171 For `gpt-realtime-whisper` transcription sessions, turn detection must be

172 set to `null`; VAD is not supported.

~~173~~

174 - `ServerVad`

~~175~~

176 - `JsonValue; type "server_vad"constant`

~~177~~

178 Type of turn detection, `server_vad` to turn on simple Server VAD.

~~179~~

180 - `SERVER_VAD("server_vad")`

~~181~~

182 - `Optional<Boolean> createResponse`

~~183~~

184 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.

~~185~~

186 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~187~~

188 - `Optional<Long> idleTimeoutMs`

~~189~~

190 Optional timeout after which a model response will be triggered automatically. This is

191 useful for situations in which a long pause from the user is unexpected, such as a phone

192 call. The model will effectively prompt the user to continue the conversation based

193 on the current context.

~~194~~

195 The timeout value will be applied after the last model response's audio has finished playing,

196 i.e. it's set to the `response.done` time plus audio playback duration.

~~197~~

198 An `input_audio_buffer.timeout_triggered` event (plus events

199 associated with the Response) will be emitted when the timeout is reached.

200 Idle timeout is currently only supported for `server_vad` mode.

~~201~~

202 - `Optional<Boolean> interruptResponse`

~~203~~

204 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default

205 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.

~~206~~

207 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~208~~

209 - `Optional<Long> prefixPaddingMs`

~~210~~

211 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in

212 milliseconds). Defaults to 300ms.

~~213~~

214 - `Optional<Long> silenceDurationMs`

~~215~~

216 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults

217 to 500ms. With shorter values the model will respond more quickly,

218 but may jump in on short pauses from the user.

~~219~~

220 - `Optional<Double> threshold`

~~221~~

222 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

223 higher threshold will require louder audio to activate the model, and

224 thus might perform better in noisy environments.

~~225~~

226 - `SemanticVad`

~~227~~

228 - `JsonValue; type "semantic_vad"constant`

~~229~~

230 Type of turn detection, `semantic_vad` to turn on Semantic VAD.

~~231~~

232 - `SEMANTIC_VAD("semantic_vad")`

~~233~~

234 - `Optional<Boolean> createResponse`

~~235~~

236 Whether or not to automatically generate a response when a VAD stop event occurs.

~~237~~

238 - `Optional<Eagerness> eagerness`

~~239~~

240 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.

~~241~~

242 - `LOW("low")`

~~243~~

244 - `MEDIUM("medium")`

~~245~~

246 - `HIGH("high")`

~~247~~

248 - `AUTO("auto")`

~~249~~

250 - `Optional<Boolean> interruptResponse`

~~251~~

252 Whether or not to automatically interrupt any ongoing response with output to the default

253 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.

~~254~~

255 - `Optional<RealtimeAudioConfigOutput> output`

~~256~~

257 - `Optional<RealtimeAudioFormats> format`

~~258~~

259 The format of the output audio.

~~260~~

261 - `Optional<Double> speed`

~~262~~

263 The speed of the model's spoken response as a multiple of the original speed.

264 1.0 is the default speed. 0.25 is the minimum speed. 1.5 is the maximum speed. This value can only be changed in between model turns, not while a response is in progress.

~~265~~

266 This parameter is a post-processing adjustment to the audio after it is generated, it's

267 also possible to prompt the model to speak faster or slower.

~~268~~

269 - `Optional<Voice> voice`

~~270~~

271 The voice the model uses to respond. Supported built-in voices are

272 `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`, `shimmer`, `verse`,

273 `marin`, and `cedar`. You may also provide a custom voice object with

274 an `id`, for example `{ "id": "voice_1234" }`. Voice cannot be changed

275 during the session once the model has responded with audio at least once.

276 We recommend `marin` and `cedar` for best quality.

~~277~~

278 - `String`

~~279~~

280 - `enum UnionMember1:`

~~281~~

282 - `ALLOY("alloy")`

~~283~~

284 - `ASH("ash")`

~~285~~

286 - `BALLAD("ballad")`

~~287~~

288 - `CORAL("coral")`

~~289~~

290 - `ECHO("echo")`

~~291~~

292 - `SAGE("sage")`

~~293~~

294 - `SHIMMER("shimmer")`

~~295~~

296 - `VERSE("verse")`

~~297~~

298 - `MARIN("marin")`

~~299~~

300 - `CEDAR("cedar")`

~~301~~

302 - `class Id:`

~~303~~

304 Custom voice reference.

~~305~~

306 - `String id`

~~307~~

308 The custom voice ID, e.g. `voice_1234`.

~~309~~

310 - `Optional<List<Include>> include`

~~311~~

312 Additional fields to include in server outputs.

~~313~~

314 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~315~~

316 - `ITEM_INPUT_AUDIO_TRANSCRIPTION_LOGPROBS("item.input_audio_transcription.logprobs")`

~~317~~

318 - `Optional<String> instructions`

~~319~~

320 The default system instructions (i.e. system message) prepended to model calls. This field allows the client to guide the model on desired responses. The model can be instructed on response content and format, (e.g. "be extremely succinct", "act friendly", "here are examples of good responses") and on audio behavior (e.g. "talk quickly", "inject emotion into your voice", "laugh frequently"). The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behavior.

~~321~~

322 Note that the server sets default instructions which will be used if this field is not set and are visible in the `session.created` event at the start of the session.

~~323~~

324 - `Optional<MaxOutputTokens> maxOutputTokens`

~~325~~

326 Maximum number of output tokens for a single assistant response,

327 inclusive of tool calls. Provide an integer between 1 and 4096 to

328 limit output tokens, or `inf` for the maximum available tokens for a

329 given model. Defaults to `inf`.

~~330~~

331 - `long`

~~332~~

333 - `JsonValue;`

~~334~~

335 - `INF("inf")`

~~336~~

337 - `Optional<Model> model`

~~338~~

339 The Realtime model used for this session.

~~340~~

341 - `GPT_REALTIME("gpt-realtime")`

~~342~~

343 - `GPT_REALTIME_1_5("gpt-realtime-1.5")`

~~344~~

345 - `GPT_REALTIME_2("gpt-realtime-2")`

~~346~~

347 - `GPT_REALTIME_2025_08_28("gpt-realtime-2025-08-28")`

~~348~~

349 - `GPT_4O_REALTIME_PREVIEW("gpt-4o-realtime-preview")`

~~350~~

351 - `GPT_4O_REALTIME_PREVIEW_2024_10_01("gpt-4o-realtime-preview-2024-10-01")`

~~352~~

353 - `GPT_4O_REALTIME_PREVIEW_2024_12_17("gpt-4o-realtime-preview-2024-12-17")`

~~354~~

355 - `GPT_4O_REALTIME_PREVIEW_2025_06_03("gpt-4o-realtime-preview-2025-06-03")`

~~356~~

357 - `GPT_4O_MINI_REALTIME_PREVIEW("gpt-4o-mini-realtime-preview")`

~~358~~

359 - `GPT_4O_MINI_REALTIME_PREVIEW_2024_12_17("gpt-4o-mini-realtime-preview-2024-12-17")`

~~360~~

361 - `GPT_REALTIME_MINI("gpt-realtime-mini")`

~~362~~

363 - `GPT_REALTIME_MINI_2025_10_06("gpt-realtime-mini-2025-10-06")`

~~364~~

365 - `GPT_REALTIME_MINI_2025_12_15("gpt-realtime-mini-2025-12-15")`

~~366~~

367 - `GPT_AUDIO_1_5("gpt-audio-1.5")`

~~368~~

369 - `GPT_AUDIO_MINI("gpt-audio-mini")`

~~370~~

371 - `GPT_AUDIO_MINI_2025_10_06("gpt-audio-mini-2025-10-06")`

~~372~~

373 - `GPT_AUDIO_MINI_2025_12_15("gpt-audio-mini-2025-12-15")`

~~374~~

375 - `Optional<List<OutputModality>> outputModalities`

~~376~~

377 The set of modalities the model can respond with. It defaults to `["audio"]`, indicating

378 that the model will respond with audio plus a transcript. `["text"]` can be used to make

379 the model respond with text only. It is not possible to request both `text` and `audio` at the same time.

~~380~~

381 - `TEXT("text")`

~~382~~

383 - `AUDIO("audio")`

~~384~~

385 - `Optional<Boolean> parallelToolCalls`

~~386~~

387 Whether the model may call multiple tools in parallel. Only supported by

388 reasoning Realtime models such as `gpt-realtime-2`.

~~389~~

390 - `Optional<ResponsePrompt> prompt`

~~391~~

392 Reference to a prompt template and its variables.

393 [Learn more](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts).

~~394~~

395 - `String id`

~~396~~

397 The unique identifier of the prompt template to use.

~~398~~

399 - `Optional<Variables> variables`

~~400~~

401 Optional map of values to substitute in for variables in your

402 prompt. The substitution values can either be strings, or other

403 Response input types like images or files.

~~404~~

405 - `String`

~~406~~

407 - `class ResponseInputText:`

~~408~~

409 A text input to the model.

~~410~~

411 - `String text`

~~412~~

413 The text input to the model.

~~414~~

415 - `JsonValue; type "input_text"constant`

~~416~~

417 The type of the input item. Always `input_text`.

~~418~~

419 - `INPUT_TEXT("input_text")`

~~420~~

421 - `class ResponseInputImage:`

~~422~~

423 An image input to the model. Learn about [image inputs](https://platform.openai.com/docs/guides/vision).

~~424~~

425 - `Detail detail`

~~426~~

427 The detail level of the image to be sent to the model. One of `high`, `low`, `auto`, or `original`. Defaults to `auto`.

~~428~~

429 - `LOW("low")`

~~430~~

431 - `HIGH("high")`

~~432~~

433 - `AUTO("auto")`

~~434~~

435 - `ORIGINAL("original")`

~~436~~

437 - `JsonValue; type "input_image"constant`

~~438~~

439 The type of the input item. Always `input_image`.

~~440~~

441 - `INPUT_IMAGE("input_image")`

~~442~~

443 - `Optional<String> fileId`

~~444~~

445 The ID of the file to be sent to the model.

~~446~~

447 - `Optional<String> imageUrl`

~~448~~

449 The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.

~~450~~

451 - `class ResponseInputFile:`

~~452~~

453 A file input to the model.

~~454~~

455 - `JsonValue; type "input_file"constant`

~~456~~

457 The type of the input item. Always `input_file`.

~~458~~

459 - `INPUT_FILE("input_file")`

~~460~~

461 - `Optional<Detail> detail`

~~462~~

463 The detail level of the file to be sent to the model. Use `low` for the default rendering behavior, or `high` to render the file at higher quality. Defaults to `low`.

~~464~~

465 - `LOW("low")`

~~466~~

467 - `HIGH("high")`

~~468~~

469 - `Optional<String> fileData`

~~470~~

471 The content of the file to be sent to the model.

~~472~~

473 - `Optional<String> fileId`

~~474~~

475 The ID of the file to be sent to the model.

~~476~~

477 - `Optional<String> fileUrl`

~~478~~

479 The URL of the file to be sent to the model.

~~480~~

481 - `Optional<String> filename`

~~482~~

483 The name of the file to be sent to the model.

~~484~~

485 - `Optional<String> version`

~~486~~

487 Optional version of the prompt template.

~~488~~

489 - `Optional<RealtimeReasoning> reasoning`

~~490~~

491 Configuration for reasoning-capable Realtime models such as `gpt-realtime-2`.

~~492~~

493 - `Optional<RealtimeReasoningEffort> effort`

~~494~~

495 Constrains effort on reasoning for reasoning-capable Realtime models such as

496 `gpt-realtime-2`.

~~497~~

498 - `MINIMAL("minimal")`

~~499~~

500 - `LOW("low")`

~~501~~

502 - `MEDIUM("medium")`

~~503~~

504 - `HIGH("high")`

~~505~~

506 - `XHIGH("xhigh")`

~~507~~

508 - `Optional<RealtimeToolChoiceConfig> toolChoice`

~~509~~

510 How the model chooses tools. Provide one of the string modes or force a specific

511 function/MCP tool.

~~512~~

513 - `enum ToolChoiceOptions:`

~~514~~

515 Controls which (if any) tool is called by the model.

~~516~~

517 `none` means the model will not call any tool and instead generates a message.

~~518~~

519 `auto` means the model can pick between generating a message or calling one or

520 more tools.

~~521~~

522 `required` means the model must call one or more tools.

~~523~~

524 - `NONE("none")`

~~525~~

526 - `AUTO("auto")`

~~527~~

528 - `REQUIRED("required")`

~~529~~

530 - `class ToolChoiceFunction:`

~~531~~

532 Use this option to force the model to call a specific function.

~~533~~

534 - `String name`

~~535~~

536 The name of the function to call.

~~537~~

538 - `JsonValue; type "function"constant`

~~539~~

540 For function calling, the type is always `function`.

~~541~~

542 - `FUNCTION("function")`

~~543~~

544 - `class ToolChoiceMcp:`

~~545~~

546 Use this option to force the model to call a specific tool on a remote MCP server.

~~547~~

548 - `String serverLabel`

~~549~~

550 The label of the MCP server to use.

~~551~~

552 - `JsonValue; type "mcp"constant`

~~553~~

554 For MCP tools, the type is always `mcp`.

~~555~~

556 - `MCP("mcp")`

~~557~~

558 - `Optional<String> name`

~~559~~

560 The name of the tool to call on the server.

~~561~~

562 - `Optional<List<RealtimeToolsConfigUnion>> tools`

~~563~~

564 Tools available to the model.

~~565~~

566 - `class RealtimeFunctionTool:`

~~567~~

568 - `Optional<String> description`

~~569~~

570 The description of the function, including guidance on when and how

571 to call it, and guidance about what to tell the user when calling

572 (if anything).

~~573~~

574 - `Optional<String> name`

~~575~~

576 The name of the function.

~~577~~

578 - `Optional<JsonValue> parameters`

~~579~~

580 Parameters of the function in JSON Schema.

~~581~~

582 - `Optional<Type> type`

~~583~~

584 The type of the tool, i.e. `function`.

~~585~~

586 - `FUNCTION("function")`

~~587~~

588 - `Mcp`

~~589~~

590 - `String serverLabel`

~~591~~

592 A label for this MCP server, used to identify it in tool calls.

~~593~~

594 - `JsonValue; type "mcp"constant`

~~595~~

596 The type of the MCP tool. Always `mcp`.

~~597~~

598 - `MCP("mcp")`

~~599~~

600 - `Optional<AllowedTools> allowedTools`

~~601~~

602 List of allowed tool names or a filter object.

~~603~~

604 - `List<String>`

~~605~~

606 - `class McpToolFilter:`

~~607~~

608 A filter object to specify which tools are allowed.

~~609~~

610 - `Optional<Boolean> readOnly`

~~611~~

612 Indicates whether or not a tool modifies data or is read-only. If an

613 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

614 it will match this filter.

~~615~~

616 - `Optional<List<String>> toolNames`

~~617~~

618 List of allowed tool names.

~~619~~

620 - `Optional<String> authorization`

~~621~~

622 An OAuth access token that can be used with a remote MCP server, either

623 with a custom MCP server URL or a service connector. Your application

624 must handle the OAuth authorization flow and provide the token here.

~~625~~

626 - `Optional<ConnectorId> connectorId`

~~627~~

628 Identifier for service connectors, like those available in ChatGPT. One of

629 `server_url` or `connector_id` must be provided. Learn more about service

630 connectors [here](https://platform.openai.com/docs/guides/tools-remote-mcp#connectors).

~~631~~

632 Currently supported `connector_id` values are:

~~633~~

634 - Dropbox: `connector_dropbox`

635 - Gmail: `connector_gmail`

636 - Google Calendar: `connector_googlecalendar`

637 - Google Drive: `connector_googledrive`

638 - Microsoft Teams: `connector_microsoftteams`

639 - Outlook Calendar: `connector_outlookcalendar`

640 - Outlook Email: `connector_outlookemail`

641 - SharePoint: `connector_sharepoint`

~~642~~

643 - `CONNECTOR_DROPBOX("connector_dropbox")`

~~644~~

645 - `CONNECTOR_GMAIL("connector_gmail")`

~~646~~

647 - `CONNECTOR_GOOGLECALENDAR("connector_googlecalendar")`

~~648~~

649 - `CONNECTOR_GOOGLEDRIVE("connector_googledrive")`

~~650~~

651 - `CONNECTOR_MICROSOFTTEAMS("connector_microsoftteams")`

~~652~~

653 - `CONNECTOR_OUTLOOKCALENDAR("connector_outlookcalendar")`

~~654~~

655 - `CONNECTOR_OUTLOOKEMAIL("connector_outlookemail")`

~~656~~

657 - `CONNECTOR_SHAREPOINT("connector_sharepoint")`

~~658~~

659 - `Optional<Boolean> deferLoading`

~~660~~

661 Whether this MCP tool is deferred and discovered via tool search.

~~662~~

663 - `Optional<Headers> headers`

~~664~~

665 Optional HTTP headers to send to the MCP server. Use for authentication

666 or other purposes.

~~667~~

668 - `Optional<RequireApproval> requireApproval`

~~669~~

670 Specify which of the MCP server's tools require approval.

~~671~~

672 - `class McpToolApprovalFilter:`

~~673~~

674 Specify which of the MCP server's tools require approval. Can be

675 `always`, `never`, or a filter object associated with tools

676 that require approval.

~~677~~

678 - `Optional<Always> always`

~~679~~

680 A filter object to specify which tools are allowed.

~~681~~

682 - `Optional<Boolean> readOnly`

~~683~~

684 Indicates whether or not a tool modifies data or is read-only. If an

685 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

686 it will match this filter.

~~687~~

688 - `Optional<List<String>> toolNames`

~~689~~

690 List of allowed tool names.

~~691~~

692 - `Optional<Never> never`

~~693~~

694 A filter object to specify which tools are allowed.

~~695~~

696 - `Optional<Boolean> readOnly`

~~697~~

698 Indicates whether or not a tool modifies data or is read-only. If an

699 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

700 it will match this filter.

~~701~~

702 - `Optional<List<String>> toolNames`

~~703~~

704 List of allowed tool names.

~~705~~

706 - `enum McpToolApprovalSetting:`

~~707~~

708 Specify a single approval policy for all tools. One of `always` or

709 `never`. When set to `always`, all tools will require approval. When

710 set to `never`, all tools will not require approval.

~~711~~

712 - `ALWAYS("always")`

~~713~~

714 - `NEVER("never")`

~~715~~

716 - `Optional<String> serverDescription`

~~717~~

718 Optional description of the MCP server, used to provide more context.

~~719~~

720 - `Optional<String> serverUrl`

~~721~~

722 The URL for the MCP server. One of `server_url` or `connector_id` must be

723 provided.

~~724~~

725 - `Optional<RealtimeTracingConfig> tracing`

~~726~~

727 Realtime API can write session traces to the [Traces Dashboard](https://platform.openai.com/logs?api=traces). Set to null to disable tracing. Once

728 tracing is enabled for a session, the configuration cannot be modified.

~~729~~

730 `auto` will create a trace for the session with default values for the

731 workflow name, group id, and metadata.

~~732~~

733 - `JsonValue;`

~~734~~

735 - `AUTO("auto")`

~~736~~

737 - `TracingConfiguration`

~~738~~

739 - `Optional<String> groupId`

~~740~~

741 The group id to attach to this trace to enable filtering and

742 grouping in the Traces Dashboard.

~~743~~

744 - `Optional<JsonValue> metadata`

~~745~~

746 The arbitrary metadata to attach to this trace to enable

747 filtering in the Traces Dashboard.

~~748~~

749 - `Optional<String> workflowName`

~~750~~

751 The name of the workflow to attach to this trace. This is used to

752 name the trace in the Traces Dashboard.

~~753~~

754 - `Optional<RealtimeTruncation> truncation`

~~755~~

756 When the number of tokens in a conversation exceeds the model's input token limit, the conversation be truncated, meaning messages (starting from the oldest) will not be included in the model's context. A 32k context model with 4,096 max output tokens can only include 28,224 tokens in the context before truncation occurs.

~~757~~

758 Clients can configure truncation behavior to truncate with a lower max token limit, which is an effective way to control token usage and cost.

~~759~~

760 Truncation will reduce the number of cached tokens on the next turn (busting the cache), since messages are dropped from the beginning of the context. However, clients can also configure truncation to retain messages up to a fraction of the maximum context size, which will reduce the need for future truncations and thus improve the cache rate.

~~761~~

762 Truncation can be disabled entirely, which means the server will never truncate but would instead return an error if the conversation exceeds the model's input token limit.

~~763~~

764 - `RealtimeTruncationStrategy`

~~765~~

766 - `AUTO("auto")`

~~767~~

768 - `DISABLED("disabled")`

~~769~~

770 - `class RealtimeTruncationRetentionRatio:`

~~771~~

772 Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.

~~773~~

774 - `double retentionRatio`

~~775~~

776 Fraction of post-instruction conversation tokens to retain (`0.0` - `1.0`) when the conversation exceeds the input token limit. Setting this to `0.8` means that messages will be dropped until 80% of the maximum allowed tokens are used. This helps reduce the frequency of truncations and improve cache rates.

~~777~~

778 - `JsonValue; type "retention_ratio"constant`

~~779~~

780 Use retention ratio truncation.

~~781~~

782 - `RETENTION_RATIO("retention_ratio")`

~~783~~

784 - `Optional<TokenLimits> tokenLimits`

~~785~~

786 Optional custom token limits for this truncation strategy. If not provided, the model's default token limits will be used.

~~787~~

788 - `Optional<Long> postInstructions`

~~789~~

790 Maximum tokens allowed in the conversation after instructions (which including tool definitions). For example, setting this to 5,000 would mean that truncation would occur when the conversation exceeds 5,000 tokens after instructions. This cannot be higher than the model's context window size minus the maximum output tokens.

~~791~~

792 - `class RealtimeTranscriptionSessionCreateRequest:`

~~793~~

794 Realtime transcription session object configuration.

~~795~~

796 - `JsonValue; type "transcription"constant`

~~797~~

798 The type of session to create. Always `transcription` for transcription sessions.

~~799~~

800 - `TRANSCRIPTION("transcription")`

~~801~~

802 - `Optional<RealtimeTranscriptionSessionAudio> audio`

~~803~~

804 Configuration for input and output audio.

~~805~~

806 - `Optional<RealtimeTranscriptionSessionAudioInput> input`

~~807~~

808 - `Optional<RealtimeAudioFormats> format`

~~809~~

810 The PCM audio format. Only a 24kHz sample rate is supported.

~~811~~

812 - `Optional<NoiseReduction> noiseReduction`

~~813~~

814 Configuration for input audio noise reduction. This can be set to `null` to turn off.

815 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.

816 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.

~~817~~

818 - `Optional<NoiseReductionType> type`

~~819~~

820 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.

~~821~~

822 - `Optional<AudioTranscription> transcription`

~~823~~

824 Configuration for input audio transcription, defaults to off and can be set to `null` to turn off once on. Input audio transcription is not native to the model, since the model consumes audio directly. Transcription runs asynchronously through [the /audio/transcriptions endpoint](https://platform.openai.com/docs/api-reference/audio/createTranscription) and should be treated as guidance of input audio content rather than precisely what the model heard. The client can optionally set the language and prompt for transcription, these offer additional guidance to the transcription service.

~~825~~

826 - `Optional<RealtimeTranscriptionSessionAudioInputTurnDetection> turnDetection`

~~827~~

828 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.

~~829~~

830 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.

~~831~~

832 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.

~~833~~

834 For `gpt-realtime-whisper` transcription sessions, turn detection must be

835 set to `null`; VAD is not supported.

~~836~~

837 - `ServerVad`

~~838~~

839 - `JsonValue; type "server_vad"constant`

~~840~~

841 Type of turn detection, `server_vad` to turn on simple Server VAD.

~~842~~

843 - `SERVER_VAD("server_vad")`

~~844~~

845 - `Optional<Boolean> createResponse`

~~846~~

847 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.

~~848~~

849 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~850~~

851 - `Optional<Long> idleTimeoutMs`

~~852~~

853 Optional timeout after which a model response will be triggered automatically. This is

854 useful for situations in which a long pause from the user is unexpected, such as a phone

855 call. The model will effectively prompt the user to continue the conversation based

856 on the current context.

~~857~~

858 The timeout value will be applied after the last model response's audio has finished playing,

859 i.e. it's set to the `response.done` time plus audio playback duration.

~~860~~

861 An `input_audio_buffer.timeout_triggered` event (plus events

862 associated with the Response) will be emitted when the timeout is reached.

863 Idle timeout is currently only supported for `server_vad` mode.

~~864~~

865 - `Optional<Boolean> interruptResponse`

~~866~~

867 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default

868 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.

~~869~~

870 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~871~~

872 - `Optional<Long> prefixPaddingMs`

~~873~~

874 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in

875 milliseconds). Defaults to 300ms.

~~876~~

877 - `Optional<Long> silenceDurationMs`

~~878~~

879 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults

880 to 500ms. With shorter values the model will respond more quickly,

881 but may jump in on short pauses from the user.

~~882~~

883 - `Optional<Double> threshold`

~~884~~

885 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

886 higher threshold will require louder audio to activate the model, and

887 thus might perform better in noisy environments.

~~888~~

889 - `SemanticVad`

~~890~~

891 - `JsonValue; type "semantic_vad"constant`

~~892~~

893 Type of turn detection, `semantic_vad` to turn on Semantic VAD.

~~894~~

895 - `SEMANTIC_VAD("semantic_vad")`

~~896~~

897 - `Optional<Boolean> createResponse`

~~898~~

899 Whether or not to automatically generate a response when a VAD stop event occurs.

~~900~~

901 - `Optional<Eagerness> eagerness`

~~902~~

903 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.

~~904~~

905 - `LOW("low")`

~~906~~

907 - `MEDIUM("medium")`

~~908~~

909 - `HIGH("high")`

~~910~~

911 - `AUTO("auto")`

~~912~~

913 - `Optional<Boolean> interruptResponse`

~~914~~

915 Whether or not to automatically interrupt any ongoing response with output to the default

916 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.

~~917~~

918 - `Optional<List<Include>> include`

~~919~~

920 Additional fields to include in server outputs.

~~921~~

922 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~923~~

924 - `ITEM_INPUT_AUDIO_TRANSCRIPTION_LOGPROBS("item.input_audio_transcription.logprobs")`

~~925~~

926### Returns

~~927~~

928- `class ClientSecretCreateResponse:`

~~929~~

930 Response from creating a session and client secret for the Realtime API.

~~931~~

932 - `long expiresAt`

~~933~~

934 Expiration timestamp for the client secret, in seconds since epoch.

~~935~~

936 - `Session session`

~~937~~

938 The session configuration for either a realtime or transcription session.

~~939~~

940 - `class RealtimeSessionCreateResponse:`

~~941~~

942 A Realtime session configuration object.

~~943~~

944 - `String id`

~~945~~

946 Unique identifier for the session that looks like `sess_1234567890abcdef`.

~~947~~

948 - `JsonValue; object_ "realtime.session"constant`

~~949~~

950 The object type. Always `realtime.session`.

~~951~~

952 - `REALTIME_SESSION("realtime.session")`

~~953~~

954 - `JsonValue; type "realtime"constant`

~~955~~

956 The type of session to create. Always `realtime` for the Realtime API.

~~957~~

958 - `REALTIME("realtime")`

~~959~~

960 - `Optional<Audio> audio`

~~961~~

962 Configuration for input and output audio.

~~963~~

964 - `Optional<Input> input`

~~965~~

966 - `Optional<RealtimeAudioFormats> format`

~~967~~

968 The format of the input audio.

~~969~~

970 - `AudioPcm`

~~971~~

972 - `Optional<Rate> rate`

~~973~~

974 The sample rate of the audio. Always `24000`.

~~975~~

976 - `_24000(24000)`

~~977~~

978 - `Optional<Type> type`

~~979~~

980 The audio format. Always `audio/pcm`.

~~981~~

982 - `AUDIO_PCM("audio/pcm")`

~~983~~

984 - `AudioPcmu`

~~985~~

986 - `Optional<Type> type`

~~987~~

988 The audio format. Always `audio/pcmu`.

~~989~~

990 - `AUDIO_PCMU("audio/pcmu")`

~~991~~

992 - `AudioPcma`

~~993~~

994 - `Optional<Type> type`

~~995~~

996 The audio format. Always `audio/pcma`.

~~997~~

998 - `AUDIO_PCMA("audio/pcma")`

~~999~~

1000 - `Optional<NoiseReduction> noiseReduction`

~~1001~~

1002 Configuration for input audio noise reduction. This can be set to `null` to turn off.

1003 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.

1004 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.

~~1005~~

1006 - `Optional<NoiseReductionType> type`

~~1007~~

1008 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.

~~1009~~

1010 - `NEAR_FIELD("near_field")`

~~1011~~

1012 - `FAR_FIELD("far_field")`

~~1013~~

1014 - `Optional<AudioTranscription> transcription`

~~1015~~

1016 - `Optional<Delay> delay`

~~1017~~

1018 Controls how long the model waits before emitting transcription text.

1019 Higher values can improve transcription accuracy at the cost of latency.

1020 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~1021~~

1022 - `MINIMAL("minimal")`

~~1023~~

1024 - `LOW("low")`

~~1025~~

1026 - `MEDIUM("medium")`

~~1027~~

1028 - `HIGH("high")`

~~1029~~

1030 - `XHIGH("xhigh")`

~~1031~~

1032 - `Optional<String> language`

~~1033~~

1034 The language of the input audio. Supplying the input language in

1035 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format

1036 will improve accuracy and latency.

~~1037~~

1038 - `Optional<Model> model`

~~1039~~

1040 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~1041~~

1042 - `WHISPER_1("whisper-1")`

~~1043~~

1044 - `GPT_4O_MINI_TRANSCRIBE("gpt-4o-mini-transcribe")`

~~1045~~

1046 - `GPT_4O_MINI_TRANSCRIBE_2025_12_15("gpt-4o-mini-transcribe-2025-12-15")`

~~1047~~

1048 - `GPT_4O_TRANSCRIBE("gpt-4o-transcribe")`

~~1049~~

1050 - `GPT_4O_TRANSCRIBE_DIARIZE("gpt-4o-transcribe-diarize")`

~~1051~~

1052 - `GPT_REALTIME_WHISPER("gpt-realtime-whisper")`

~~1053~~

1054 - `Optional<String> prompt`

~~1055~~

1056 An optional text to guide the model's style or continue a previous audio

1057 segment.

1058 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).

1059 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".

1060 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~1061~~

1062 - `Optional<TurnDetection> turnDetection`

~~1063~~

1064 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.

~~1065~~

1066 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.

~~1067~~

1068 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.

~~1069~~

1070 For `gpt-realtime-whisper` transcription sessions, turn detection must be

1071 set to `null`; VAD is not supported.

~~1072~~

1073 - `class ServerVad:`

~~1074~~

1075 Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.

~~1076~~

1077 - `JsonValue; type "server_vad"constant`

~~1078~~

1079 Type of turn detection, `server_vad` to turn on simple Server VAD.

~~1080~~

1081 - `SERVER_VAD("server_vad")`

~~1082~~

1083 - `Optional<Boolean> createResponse`

~~1084~~

1085 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.

~~1086~~

1087 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~1088~~

1089 - `Optional<Long> idleTimeoutMs`

~~1090~~

1091 Optional timeout after which a model response will be triggered automatically. This is

1092 useful for situations in which a long pause from the user is unexpected, such as a phone

1093 call. The model will effectively prompt the user to continue the conversation based

1094 on the current context.

~~1095~~

1096 The timeout value will be applied after the last model response's audio has finished playing,

1097 i.e. it's set to the `response.done` time plus audio playback duration.

~~1098~~

1099 An `input_audio_buffer.timeout_triggered` event (plus events

1100 associated with the Response) will be emitted when the timeout is reached.

1101 Idle timeout is currently only supported for `server_vad` mode.

~~1102~~

1103 - `Optional<Boolean> interruptResponse`

~~1104~~

1105 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default

1106 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.

~~1107~~

1108 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~1109~~

1110 - `Optional<Long> prefixPaddingMs`

~~1111~~

1112 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in

1113 milliseconds). Defaults to 300ms.

~~1114~~

1115 - `Optional<Long> silenceDurationMs`

~~1116~~

1117 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults

1118 to 500ms. With shorter values the model will respond more quickly,

1119 but may jump in on short pauses from the user.

~~1120~~

1121 - `Optional<Double> threshold`

~~1122~~

1123 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

1124 higher threshold will require louder audio to activate the model, and

1125 thus might perform better in noisy environments.

~~1126~~

1127 - `class SemanticVad:`

~~1128~~

1129 Server-side semantic turn detection which uses a model to determine when the user has finished speaking.

~~1130~~

1131 - `JsonValue; type "semantic_vad"constant`

~~1132~~

1133 Type of turn detection, `semantic_vad` to turn on Semantic VAD.

~~1134~~

1135 - `SEMANTIC_VAD("semantic_vad")`

~~1136~~

1137 - `Optional<Boolean> createResponse`

~~1138~~

1139 Whether or not to automatically generate a response when a VAD stop event occurs.

~~1140~~

1141 - `Optional<Eagerness> eagerness`

~~1142~~

1143 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.

~~1144~~

1145 - `LOW("low")`

~~1146~~

1147 - `MEDIUM("medium")`

~~1148~~

1149 - `HIGH("high")`

~~1150~~

1151 - `AUTO("auto")`

~~1152~~

1153 - `Optional<Boolean> interruptResponse`

~~1154~~

1155 Whether or not to automatically interrupt any ongoing response with output to the default

1156 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.

~~1157~~

1158 - `Optional<Output> output`

~~1159~~

1160 - `Optional<RealtimeAudioFormats> format`

~~1161~~

1162 The format of the output audio.

~~1163~~

1164 - `Optional<Double> speed`

~~1165~~

1166 The speed of the model's spoken response as a multiple of the original speed.

1167 1.0 is the default speed. 0.25 is the minimum speed. 1.5 is the maximum speed. This value can only be changed in between model turns, not while a response is in progress.

~~1168~~

1169 This parameter is a post-processing adjustment to the audio after it is generated, it's

1170 also possible to prompt the model to speak faster or slower.

~~1171~~

1172 - `Optional<Voice> voice`

~~1173~~

1174 The voice the model uses to respond. Voice cannot be changed during the

1175 session once the model has responded with audio at least once. Current

1176 voice options are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`,

1177 `shimmer`, `verse`, `marin`, and `cedar`. We recommend `marin` and `cedar` for

1178 best quality.

~~1179~~

1180 - `ALLOY("alloy")`

~~1181~~

1182 - `ASH("ash")`

~~1183~~

1184 - `BALLAD("ballad")`

~~1185~~

1186 - `CORAL("coral")`

~~1187~~

1188 - `ECHO("echo")`

~~1189~~

1190 - `SAGE("sage")`

~~1191~~

1192 - `SHIMMER("shimmer")`

~~1193~~

1194 - `VERSE("verse")`

~~1195~~

1196 - `MARIN("marin")`

~~1197~~

1198 - `CEDAR("cedar")`

~~1199~~

1200 - `Optional<Long> expiresAt`

~~1201~~

1202 Expiration timestamp for the session, in seconds since epoch.

~~1203~~

1204 - `Optional<List<Include>> include`

~~1205~~

1206 Additional fields to include in server outputs.

~~1207~~

1208 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~1209~~

1210 - `ITEM_INPUT_AUDIO_TRANSCRIPTION_LOGPROBS("item.input_audio_transcription.logprobs")`

~~1211~~

1212 - `Optional<String> instructions`

~~1213~~

1214 The default system instructions (i.e. system message) prepended to model calls. This field allows the client to guide the model on desired responses. The model can be instructed on response content and format, (e.g. "be extremely succinct", "act friendly", "here are examples of good responses") and on audio behavior (e.g. "talk quickly", "inject emotion into your voice", "laugh frequently"). The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behavior.

~~1215~~

1216 Note that the server sets default instructions which will be used if this field is not set and are visible in the `session.created` event at the start of the session.

~~1217~~

1218 - `Optional<MaxOutputTokens> maxOutputTokens`

~~1219~~

1220 Maximum number of output tokens for a single assistant response,

1221 inclusive of tool calls. Provide an integer between 1 and 4096 to

1222 limit output tokens, or `inf` for the maximum available tokens for a

1223 given model. Defaults to `inf`.

~~1224~~

1225 - `long`

~~1226~~

1227 - `JsonValue;`

~~1228~~

1229 - `INF("inf")`

~~1230~~

1231 - `Optional<Model> model`

~~1232~~

1233 The Realtime model used for this session.

~~1234~~

1235 - `GPT_REALTIME("gpt-realtime")`

~~1236~~

1237 - `GPT_REALTIME_1_5("gpt-realtime-1.5")`

~~1238~~

1239 - `GPT_REALTIME_2("gpt-realtime-2")`

~~1240~~

1241 - `GPT_REALTIME_2025_08_28("gpt-realtime-2025-08-28")`

~~1242~~

1243 - `GPT_4O_REALTIME_PREVIEW("gpt-4o-realtime-preview")`

~~1244~~

1245 - `GPT_4O_REALTIME_PREVIEW_2024_10_01("gpt-4o-realtime-preview-2024-10-01")`

~~1246~~

1247 - `GPT_4O_REALTIME_PREVIEW_2024_12_17("gpt-4o-realtime-preview-2024-12-17")`

~~1248~~

1249 - `GPT_4O_REALTIME_PREVIEW_2025_06_03("gpt-4o-realtime-preview-2025-06-03")`

~~1250~~

1251 - `GPT_4O_MINI_REALTIME_PREVIEW("gpt-4o-mini-realtime-preview")`

~~1252~~

1253 - `GPT_4O_MINI_REALTIME_PREVIEW_2024_12_17("gpt-4o-mini-realtime-preview-2024-12-17")`

~~1254~~

1255 - `GPT_REALTIME_MINI("gpt-realtime-mini")`

~~1256~~

1257 - `GPT_REALTIME_MINI_2025_10_06("gpt-realtime-mini-2025-10-06")`

~~1258~~

1259 - `GPT_REALTIME_MINI_2025_12_15("gpt-realtime-mini-2025-12-15")`

~~1260~~

1261 - `GPT_AUDIO_1_5("gpt-audio-1.5")`

~~1262~~

1263 - `GPT_AUDIO_MINI("gpt-audio-mini")`

~~1264~~

1265 - `GPT_AUDIO_MINI_2025_10_06("gpt-audio-mini-2025-10-06")`

~~1266~~

1267 - `GPT_AUDIO_MINI_2025_12_15("gpt-audio-mini-2025-12-15")`

~~1268~~

1269 - `Optional<List<OutputModality>> outputModalities`

~~1270~~

1271 The set of modalities the model can respond with. It defaults to `["audio"]`, indicating

1272 that the model will respond with audio plus a transcript. `["text"]` can be used to make

1273 the model respond with text only. It is not possible to request both `text` and `audio` at the same time.

~~1274~~

1275 - `TEXT("text")`

~~1276~~

1277 - `AUDIO("audio")`

~~1278~~

1279 - `Optional<ResponsePrompt> prompt`

~~1280~~

1281 Reference to a prompt template and its variables.

1282 [Learn more](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts).

~~1283~~

1284 - `String id`

~~1285~~

1286 The unique identifier of the prompt template to use.

~~1287~~

1288 - `Optional<Variables> variables`

~~1289~~

1290 Optional map of values to substitute in for variables in your

1291 prompt. The substitution values can either be strings, or other

1292 Response input types like images or files.

~~1293~~

1294 - `String`

~~1295~~

1296 - `class ResponseInputText:`

~~1297~~

1298 A text input to the model.

~~1299~~

1300 - `String text`

~~1301~~

1302 The text input to the model.

~~1303~~

1304 - `JsonValue; type "input_text"constant`

~~1305~~

1306 The type of the input item. Always `input_text`.

~~1307~~

1308 - `INPUT_TEXT("input_text")`

~~1309~~

1310 - `class ResponseInputImage:`

~~1311~~

1312 An image input to the model. Learn about [image inputs](https://platform.openai.com/docs/guides/vision).

~~1313~~

1314 - `Detail detail`

~~1315~~

1316 The detail level of the image to be sent to the model. One of `high`, `low`, `auto`, or `original`. Defaults to `auto`.

~~1317~~

1318 - `LOW("low")`

~~1319~~

1320 - `HIGH("high")`

~~1321~~

1322 - `AUTO("auto")`

~~1323~~

1324 - `ORIGINAL("original")`

~~1325~~

1326 - `JsonValue; type "input_image"constant`

~~1327~~

1328 The type of the input item. Always `input_image`.

~~1329~~

1330 - `INPUT_IMAGE("input_image")`

~~1331~~

1332 - `Optional<String> fileId`

~~1333~~

1334 The ID of the file to be sent to the model.

~~1335~~

1336 - `Optional<String> imageUrl`

~~1337~~

1338 The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.

~~1339~~

1340 - `class ResponseInputFile:`

~~1341~~

1342 A file input to the model.

~~1343~~

1344 - `JsonValue; type "input_file"constant`

~~1345~~

1346 The type of the input item. Always `input_file`.

~~1347~~

1348 - `INPUT_FILE("input_file")`

~~1349~~

1350 - `Optional<Detail> detail`

~~1351~~

1352 The detail level of the file to be sent to the model. Use `low` for the default rendering behavior, or `high` to render the file at higher quality. Defaults to `low`.

~~1353~~

1354 - `LOW("low")`

~~1355~~

1356 - `HIGH("high")`

~~1357~~

1358 - `Optional<String> fileData`

~~1359~~

1360 The content of the file to be sent to the model.

~~1361~~

1362 - `Optional<String> fileId`

~~1363~~

1364 The ID of the file to be sent to the model.

~~1365~~

1366 - `Optional<String> fileUrl`

~~1367~~

1368 The URL of the file to be sent to the model.

~~1369~~

1370 - `Optional<String> filename`

~~1371~~

1372 The name of the file to be sent to the model.

~~1373~~

1374 - `Optional<String> version`

~~1375~~

1376 Optional version of the prompt template.

~~1377~~

1378 - `Optional<RealtimeReasoning> reasoning`

~~1379~~

1380 Configuration for reasoning-capable Realtime models such as `gpt-realtime-2`.

~~1381~~

1382 - `Optional<RealtimeReasoningEffort> effort`

~~1383~~

1384 Constrains effort on reasoning for reasoning-capable Realtime models such as

1385 `gpt-realtime-2`.

~~1386~~

1387 - `MINIMAL("minimal")`

~~1388~~

1389 - `LOW("low")`

~~1390~~

1391 - `MEDIUM("medium")`

~~1392~~

1393 - `HIGH("high")`

~~1394~~

1395 - `XHIGH("xhigh")`

~~1396~~

1397 - `Optional<ToolChoice> toolChoice`

~~1398~~

1399 How the model chooses tools. Provide one of the string modes or force a specific

1400 function/MCP tool.

~~1401~~

1402 - `enum ToolChoiceOptions:`

~~1403~~

1404 Controls which (if any) tool is called by the model.

~~1405~~

1406 `none` means the model will not call any tool and instead generates a message.

~~1407~~

1408 `auto` means the model can pick between generating a message or calling one or

1409 more tools.

~~1410~~

1411 `required` means the model must call one or more tools.

~~1412~~

1413 - `NONE("none")`

~~1414~~

1415 - `AUTO("auto")`

~~1416~~

1417 - `REQUIRED("required")`

~~1418~~

1419 - `class ToolChoiceFunction:`

~~1420~~

1421 Use this option to force the model to call a specific function.

~~1422~~

1423 - `String name`

~~1424~~

1425 The name of the function to call.

~~1426~~

1427 - `JsonValue; type "function"constant`

~~1428~~

1429 For function calling, the type is always `function`.

~~1430~~

1431 - `FUNCTION("function")`

~~1432~~

1433 - `class ToolChoiceMcp:`

~~1434~~

1435 Use this option to force the model to call a specific tool on a remote MCP server.

~~1436~~

1437 - `String serverLabel`

~~1438~~

1439 The label of the MCP server to use.

~~1440~~

1441 - `JsonValue; type "mcp"constant`

~~1442~~

1443 For MCP tools, the type is always `mcp`.

~~1444~~

1445 - `MCP("mcp")`

~~1446~~

1447 - `Optional<String> name`

~~1448~~

1449 The name of the tool to call on the server.

~~1450~~

1451 - `Optional<List<Tool>> tools`

~~1452~~

1453 Tools available to the model.

~~1454~~

1455 - `class RealtimeFunctionTool:`

~~1456~~

1457 - `Optional<String> description`

~~1458~~

1459 The description of the function, including guidance on when and how

1460 to call it, and guidance about what to tell the user when calling

1461 (if anything).

~~1462~~

1463 - `Optional<String> name`

~~1464~~

1465 The name of the function.

~~1466~~

1467 - `Optional<JsonValue> parameters`

~~1468~~

1469 Parameters of the function in JSON Schema.

~~1470~~

1471 - `Optional<Type> type`

~~1472~~

1473 The type of the tool, i.e. `function`.

~~1474~~

1475 - `FUNCTION("function")`

~~1476~~

1477 - `class McpTool:`

~~1478~~

1479 Give the model access to additional tools via remote Model Context Protocol

1480 (MCP) servers. [Learn more about MCP](https://platform.openai.com/docs/guides/tools-remote-mcp).

~~1481~~

1482 - `String serverLabel`

~~1483~~

1484 A label for this MCP server, used to identify it in tool calls.

~~1485~~

1486 - `JsonValue; type "mcp"constant`

~~1487~~

1488 The type of the MCP tool. Always `mcp`.

~~1489~~

1490 - `MCP("mcp")`

~~1491~~

1492 - `Optional<AllowedTools> allowedTools`

~~1493~~

1494 List of allowed tool names or a filter object.

~~1495~~

1496 - `List<String>`

~~1497~~

1498 - `class McpToolFilter:`

~~1499~~

1500 A filter object to specify which tools are allowed.

~~1501~~

1502 - `Optional<Boolean> readOnly`

~~1503~~

1504 Indicates whether or not a tool modifies data or is read-only. If an

1505 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

1506 it will match this filter.

~~1507~~

1508 - `Optional<List<String>> toolNames`

~~1509~~

1510 List of allowed tool names.

~~1511~~

1512 - `Optional<String> authorization`

~~1513~~

1514 An OAuth access token that can be used with a remote MCP server, either

1515 with a custom MCP server URL or a service connector. Your application

1516 must handle the OAuth authorization flow and provide the token here.

~~1517~~

1518 - `Optional<ConnectorId> connectorId`

~~1519~~

1520 Identifier for service connectors, like those available in ChatGPT. One of

1521 `server_url` or `connector_id` must be provided. Learn more about service

1522 connectors [here](https://platform.openai.com/docs/guides/tools-remote-mcp#connectors).

~~1523~~

1524 Currently supported `connector_id` values are:

~~1525~~

1526 - Dropbox: `connector_dropbox`

1527 - Gmail: `connector_gmail`

1528 - Google Calendar: `connector_googlecalendar`

1529 - Google Drive: `connector_googledrive`

1530 - Microsoft Teams: `connector_microsoftteams`

1531 - Outlook Calendar: `connector_outlookcalendar`

1532 - Outlook Email: `connector_outlookemail`

1533 - SharePoint: `connector_sharepoint`

~~1534~~

1535 - `CONNECTOR_DROPBOX("connector_dropbox")`

~~1536~~

1537 - `CONNECTOR_GMAIL("connector_gmail")`

~~1538~~

1539 - `CONNECTOR_GOOGLECALENDAR("connector_googlecalendar")`

~~1540~~

1541 - `CONNECTOR_GOOGLEDRIVE("connector_googledrive")`

~~1542~~

1543 - `CONNECTOR_MICROSOFTTEAMS("connector_microsoftteams")`

~~1544~~

1545 - `CONNECTOR_OUTLOOKCALENDAR("connector_outlookcalendar")`

~~1546~~

1547 - `CONNECTOR_OUTLOOKEMAIL("connector_outlookemail")`

~~1548~~

1549 - `CONNECTOR_SHAREPOINT("connector_sharepoint")`

~~1550~~

1551 - `Optional<Boolean> deferLoading`

~~1552~~

1553 Whether this MCP tool is deferred and discovered via tool search.

~~1554~~

1555 - `Optional<Headers> headers`

~~1556~~

1557 Optional HTTP headers to send to the MCP server. Use for authentication

1558 or other purposes.

~~1559~~

1560 - `Optional<RequireApproval> requireApproval`

~~1561~~

1562 Specify which of the MCP server's tools require approval.

~~1563~~

1564 - `class McpToolApprovalFilter:`

~~1565~~

1566 Specify which of the MCP server's tools require approval. Can be

1567 `always`, `never`, or a filter object associated with tools

1568 that require approval.

~~1569~~

1570 - `Optional<Always> always`

~~1571~~

1572 A filter object to specify which tools are allowed.

~~1573~~

1574 - `Optional<Boolean> readOnly`

~~1575~~

1576 Indicates whether or not a tool modifies data or is read-only. If an

1577 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

1578 it will match this filter.

~~1579~~

1580 - `Optional<List<String>> toolNames`

~~1581~~

1582 List of allowed tool names.

~~1583~~

1584 - `Optional<Never> never`

~~1585~~

1586 A filter object to specify which tools are allowed.

~~1587~~

1588 - `Optional<Boolean> readOnly`

~~1589~~

1590 Indicates whether or not a tool modifies data or is read-only. If an

1591 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

1592 it will match this filter.

~~1593~~

1594 - `Optional<List<String>> toolNames`

~~1595~~

1596 List of allowed tool names.

~~1597~~

1598 - `enum McpToolApprovalSetting:`

~~1599~~

1600 Specify a single approval policy for all tools. One of `always` or

1601 `never`. When set to `always`, all tools will require approval. When

1602 set to `never`, all tools will not require approval.

~~1603~~

1604 - `ALWAYS("always")`

~~1605~~

1606 - `NEVER("never")`

~~1607~~

1608 - `Optional<String> serverDescription`

~~1609~~

1610 Optional description of the MCP server, used to provide more context.

~~1611~~

1612 - `Optional<String> serverUrl`

~~1613~~

1614 The URL for the MCP server. One of `server_url` or `connector_id` must be

1615 provided.

~~1616~~

1617 - `Optional<Tracing> tracing`

~~1618~~

1619 Realtime API can write session traces to the [Traces Dashboard](https://platform.openai.com/logs?api=traces). Set to null to disable tracing. Once

1620 tracing is enabled for a session, the configuration cannot be modified.

~~1621~~

1622 `auto` will create a trace for the session with default values for the

1623 workflow name, group id, and metadata.

~~1624~~

1625 - `JsonValue;`

~~1626~~

1627 - `AUTO("auto")`

~~1628~~

1629 - `class TracingConfiguration:`

~~1630~~

1631 Granular configuration for tracing.

~~1632~~

1633 - `Optional<String> groupId`

~~1634~~

1635 The group id to attach to this trace to enable filtering and

1636 grouping in the Traces Dashboard.

~~1637~~

1638 - `Optional<JsonValue> metadata`

~~1639~~

1640 The arbitrary metadata to attach to this trace to enable

1641 filtering in the Traces Dashboard.

~~1642~~

1643 - `Optional<String> workflowName`

~~1644~~

1645 The name of the workflow to attach to this trace. This is used to

1646 name the trace in the Traces Dashboard.

~~1647~~

1648 - `Optional<RealtimeTruncation> truncation`

~~1649~~

1650 When the number of tokens in a conversation exceeds the model's input token limit, the conversation be truncated, meaning messages (starting from the oldest) will not be included in the model's context. A 32k context model with 4,096 max output tokens can only include 28,224 tokens in the context before truncation occurs.

~~1651~~

1652 Clients can configure truncation behavior to truncate with a lower max token limit, which is an effective way to control token usage and cost.

~~1653~~

1654 Truncation will reduce the number of cached tokens on the next turn (busting the cache), since messages are dropped from the beginning of the context. However, clients can also configure truncation to retain messages up to a fraction of the maximum context size, which will reduce the need for future truncations and thus improve the cache rate.

~~1655~~

1656 Truncation can be disabled entirely, which means the server will never truncate but would instead return an error if the conversation exceeds the model's input token limit.

~~1657~~

1658 - `RealtimeTruncationStrategy`

~~1659~~

1660 - `AUTO("auto")`

~~1661~~

1662 - `DISABLED("disabled")`

~~1663~~

1664 - `class RealtimeTruncationRetentionRatio:`

~~1665~~

1666 Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.

~~1667~~

1668 - `double retentionRatio`

~~1669~~

1670 Fraction of post-instruction conversation tokens to retain (`0.0` - `1.0`) when the conversation exceeds the input token limit. Setting this to `0.8` means that messages will be dropped until 80% of the maximum allowed tokens are used. This helps reduce the frequency of truncations and improve cache rates.

~~1671~~

1672 - `JsonValue; type "retention_ratio"constant`

~~1673~~

1674 Use retention ratio truncation.

~~1675~~

1676 - `RETENTION_RATIO("retention_ratio")`

~~1677~~

1678 - `Optional<TokenLimits> tokenLimits`

~~1679~~

1680 Optional custom token limits for this truncation strategy. If not provided, the model's default token limits will be used.

~~1681~~

1682 - `Optional<Long> postInstructions`

~~1683~~

1684 Maximum tokens allowed in the conversation after instructions (which including tool definitions). For example, setting this to 5,000 would mean that truncation would occur when the conversation exceeds 5,000 tokens after instructions. This cannot be higher than the model's context window size minus the maximum output tokens.

~~1685~~

1686 - `class RealtimeTranscriptionSessionCreateResponse:`

~~1687~~

1688 A Realtime transcription session configuration object.

~~1689~~

1690 - `String id`

~~1691~~

1692 Unique identifier for the session that looks like `sess_1234567890abcdef`.

~~1693~~

1694 - `String object_`

~~1695~~

1696 The object type. Always `realtime.transcription_session`.

~~1697~~

1698 - `JsonValue; type "transcription"constant`

~~1699~~

1700 The type of session. Always `transcription` for transcription sessions.

~~1701~~

1702 - `TRANSCRIPTION("transcription")`

~~1703~~

1704 - `Optional<Audio> audio`

~~1705~~

1706 Configuration for input audio for the session.

~~1707~~

1708 - `Optional<Input> input`

~~1709~~

1710 - `Optional<RealtimeAudioFormats> format`

~~1711~~

1712 The PCM audio format. Only a 24kHz sample rate is supported.

~~1713~~

1714 - `Optional<NoiseReduction> noiseReduction`

~~1715~~

1716 Configuration for input audio noise reduction.

~~1717~~

1718 - `Optional<NoiseReductionType> type`

~~1719~~

1720 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.

~~1721~~

1722 - `Optional<AudioTranscription> transcription`

~~1723~~

1724 - `Optional<RealtimeTranscriptionSessionTurnDetection> turnDetection`

~~1725~~

1726 Configuration for turn detection. Can be set to `null` to turn off. Server

1727 VAD means that the model will detect the start and end of speech based on

1728 audio volume and respond at the end of user speech. For `gpt-realtime-whisper`, this must be `null`; VAD is not supported.

~~1729~~

1730 - `Optional<Long> prefixPaddingMs`

~~1731~~

1732 Amount of audio to include before the VAD detected speech (in

1733 milliseconds). Defaults to 300ms.

~~1734~~

1735 - `Optional<Long> silenceDurationMs`

~~1736~~

1737 Duration of silence to detect speech stop (in milliseconds). Defaults

1738 to 500ms. With shorter values the model will respond more quickly,

1739 but may jump in on short pauses from the user.

~~1740~~

1741 - `Optional<Double> threshold`

~~1742~~

1743 Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

1744 higher threshold will require louder audio to activate the model, and

1745 thus might perform better in noisy environments.

~~1746~~

1747 - `Optional<String> type`

~~1748~~

1749 Type of turn detection, only `server_vad` is currently supported.

~~1750~~

1751 - `Optional<Long> expiresAt`

~~1752~~

1753 Expiration timestamp for the session, in seconds since epoch.

~~1754~~

1755 - `Optional<List<Include>> include`

~~1756~~

1757 Additional fields to include in server outputs.

~~1758~~

1759 - `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~1760~~

1761 - `ITEM_INPUT_AUDIO_TRANSCRIPTION_LOGPROBS("item.input_audio_transcription.logprobs")`

~~1762~~

1763 - `String value`

~~1764~~

1765 The generated client secret value.

~~1766~~

1767### Example

~~1768~~

1769```java

1770package com.openai.example;

~~1771~~

1772import com.openai.client.OpenAIClient;

1773import com.openai.client.okhttp.OpenAIOkHttpClient;

1774import com.openai.models.realtime.clientsecrets.ClientSecretCreateParams;

1775import com.openai.models.realtime.clientsecrets.ClientSecretCreateResponse;

~~1776~~

1777public final class Main {

1778 private Main() {}

~~1779~~

1780 public static void main(String[] args) {

1781 OpenAIClient client = OpenAIOkHttpClient.fromEnv();

~~1782~~

1783 ClientSecretCreateResponse clientSecret = client.realtime().clientSecrets().create();

1784 }

1785}

1786```

~~1787~~

1788#### Response

~~1789~~

1790```json

1791{

1792 "expires_at": 0,

1793 "session": {

1794 "id": "id",

1795 "object": "realtime.session",

1796 "type": "realtime",

1797 "audio": {

1798 "input": {

1799 "format": {

1800 "rate": 24000,

1801 "type": "audio/pcm"

1802 },

1803 "noise_reduction": {

1804 "type": "near_field"

1805 },

1806 "transcription": {

1807 "delay": "minimal",

1808 "language": "language",

1809 "model": "string",

1810 "prompt": "prompt"

1811 },

1812 "turn_detection": {

1813 "type": "server_vad",

1814 "create_response": true,

1815 "idle_timeout_ms": 5000,

1816 "interrupt_response": true,

1817 "prefix_padding_ms": 0,

1818 "silence_duration_ms": 0,

1819 "threshold": 0

1820 }

1821 },

1822 "output": {

1823 "format": {

1824 "rate": 24000,

1825 "type": "audio/pcm"

1826 },

1827 "speed": 0.25,

1828 "voice": "ash"

1829 }

1830 },

1831 "expires_at": 0,

1832 "include": [

1833 "item.input_audio_transcription.logprobs"

1834 ],

1835 "instructions": "instructions",

1836 "max_output_tokens": 0,

1837 "model": "string",

1838 "output_modalities": [

1839 "text"

1840 ],

1841 "prompt": {

1842 "id": "id",

1843 "variables": {

1844 "foo": "string"

1845 },

1846 "version": "version"

1847 },

1848 "reasoning": {

1849 "effort": "minimal"

1850 },

1851 "tool_choice": "none",

1852 "tools": [

1853 {

1854 "description": "description",

1855 "name": "name",

1856 "parameters": {},

1857 "type": "function"

1858 }

1859 ],

1860 "tracing": "auto",

1861 "truncation": "auto"

1862 },

1863 "value": "value"

1864}

1865```

~~1866~~

1867## Domain Types

~~1868~~

1869### Realtime Session Create Response

~~1870~~

1871- `class RealtimeSessionCreateResponse:`

~~1872~~

1873 A Realtime session configuration object.

~~1874~~

1875 - `String id`

~~1876~~

1877 Unique identifier for the session that looks like `sess_1234567890abcdef`.

~~1878~~

1879 - `JsonValue; object_ "realtime.session"constant`

~~1880~~

1881 The object type. Always `realtime.session`.

~~1882~~

1883 - `REALTIME_SESSION("realtime.session")`

~~1884~~

1885 - `JsonValue; type "realtime"constant`

~~1886~~

1887 The type of session to create. Always `realtime` for the Realtime API.

~~1888~~

1889 - `REALTIME("realtime")`

~~1890~~

1891 - `Optional<Audio> audio`

~~1892~~

1893 Configuration for input and output audio.

~~1894~~

1895 - `Optional<Input> input`

~~1896~~

1897 - `Optional<RealtimeAudioFormats> format`

~~1898~~

1899 The format of the input audio.

~~1900~~

1901 - `AudioPcm`

~~1902~~

1903 - `Optional<Rate> rate`

~~1904~~

1905 The sample rate of the audio. Always `24000`.

~~1906~~

1907 - `_24000(24000)`

~~1908~~

1909 - `Optional<Type> type`

~~1910~~

1911 The audio format. Always `audio/pcm`.

~~1912~~

1913 - `AUDIO_PCM("audio/pcm")`

~~1914~~

1915 - `AudioPcmu`

~~1916~~

1917 - `Optional<Type> type`

~~1918~~

1919 The audio format. Always `audio/pcmu`.

~~1920~~

1921 - `AUDIO_PCMU("audio/pcmu")`

~~1922~~

1923 - `AudioPcma`

~~1924~~

1925 - `Optional<Type> type`

~~1926~~

1927 The audio format. Always `audio/pcma`.

~~1928~~

1929 - `AUDIO_PCMA("audio/pcma")`

~~1930~~

1931 - `Optional<NoiseReduction> noiseReduction`

~~1932~~

1933 Configuration for input audio noise reduction. This can be set to `null` to turn off.

1934 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.

1935 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.

~~1936~~

1937 - `Optional<NoiseReductionType> type`

~~1938~~

1939 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.

~~1940~~

1941 - `NEAR_FIELD("near_field")`

~~1942~~

1943 - `FAR_FIELD("far_field")`

~~1944~~

1945 - `Optional<AudioTranscription> transcription`

~~1946~~

1947 - `Optional<Delay> delay`

~~1948~~

1949 Controls how long the model waits before emitting transcription text.

1950 Higher values can improve transcription accuracy at the cost of latency.

1951 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~1952~~

1953 - `MINIMAL("minimal")`

~~1954~~

1955 - `LOW("low")`

~~1956~~

1957 - `MEDIUM("medium")`

~~1958~~

1959 - `HIGH("high")`

~~1960~~

1961 - `XHIGH("xhigh")`

~~1962~~

1963 - `Optional<String> language`

~~1964~~

1965 The language of the input audio. Supplying the input language in

1966 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format

1967 will improve accuracy and latency.

~~1968~~

1969 - `Optional<Model> model`

~~1970~~

1971 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~1972~~

1973 - `WHISPER_1("whisper-1")`

~~1974~~

1975 - `GPT_4O_MINI_TRANSCRIBE("gpt-4o-mini-transcribe")`

~~1976~~

1977 - `GPT_4O_MINI_TRANSCRIBE_2025_12_15("gpt-4o-mini-transcribe-2025-12-15")`

~~1978~~

1979 - `GPT_4O_TRANSCRIBE("gpt-4o-transcribe")`

~~1980~~

1981 - `GPT_4O_TRANSCRIBE_DIARIZE("gpt-4o-transcribe-diarize")`

~~1982~~

1983 - `GPT_REALTIME_WHISPER("gpt-realtime-whisper")`

~~1984~~

1985 - `Optional<String> prompt`

~~1986~~

1987 An optional text to guide the model's style or continue a previous audio

1988 segment.

1989 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).

1990 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".

1991 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~1992~~

1993 - `Optional<TurnDetection> turnDetection`

~~1994~~

1995 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.

~~1996~~

1997 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.

~~1998~~

1999 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.

~~2000~~

2001 For `gpt-realtime-whisper` transcription sessions, turn detection must be

2002 set to `null`; VAD is not supported.

~~2003~~

2004 - `class ServerVad:`

~~2005~~

2006 Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.

~~2007~~

2008 - `JsonValue; type "server_vad"constant`

~~2009~~

2010 Type of turn detection, `server_vad` to turn on simple Server VAD.

~~2011~~

2012 - `SERVER_VAD("server_vad")`

~~2013~~

2014 - `Optional<Boolean> createResponse`

~~2015~~

2016 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.

~~2017~~

2018 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~2019~~

2020 - `Optional<Long> idleTimeoutMs`

~~2021~~

2022 Optional timeout after which a model response will be triggered automatically. This is

2023 useful for situations in which a long pause from the user is unexpected, such as a phone

2024 call. The model will effectively prompt the user to continue the conversation based

2025 on the current context.

~~2026~~

2027 The timeout value will be applied after the last model response's audio has finished playing,

2028 i.e. it's set to the `response.done` time plus audio playback duration.

~~2029~~

2030 An `input_audio_buffer.timeout_triggered` event (plus events

2031 associated with the Response) will be emitted when the timeout is reached.

2032 Idle timeout is currently only supported for `server_vad` mode.

~~2033~~

2034 - `Optional<Boolean> interruptResponse`

~~2035~~

2036 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default

2037 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.

~~2038~~

2039 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~2040~~

2041 - `Optional<Long> prefixPaddingMs`

~~2042~~

2043 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in

2044 milliseconds). Defaults to 300ms.

~~2045~~

2046 - `Optional<Long> silenceDurationMs`

~~2047~~

2048 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults

2049 to 500ms. With shorter values the model will respond more quickly,

2050 but may jump in on short pauses from the user.

~~2051~~

2052 - `Optional<Double> threshold`

~~2053~~

2054 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

2055 higher threshold will require louder audio to activate the model, and

2056 thus might perform better in noisy environments.

~~2057~~

2058 - `class SemanticVad:`

~~2059~~

2060 Server-side semantic turn detection which uses a model to determine when the user has finished speaking.

~~2061~~

2062 - `JsonValue; type "semantic_vad"constant`

~~2063~~

2064 Type of turn detection, `semantic_vad` to turn on Semantic VAD.

~~2065~~

2066 - `SEMANTIC_VAD("semantic_vad")`

~~2067~~

2068 - `Optional<Boolean> createResponse`

~~2069~~

2070 Whether or not to automatically generate a response when a VAD stop event occurs.

~~2071~~

2072 - `Optional<Eagerness> eagerness`

~~2073~~

2074 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.

~~2075~~

2076 - `LOW("low")`

~~2077~~

2078 - `MEDIUM("medium")`

~~2079~~

2080 - `HIGH("high")`

~~2081~~

2082 - `AUTO("auto")`

~~2083~~

2084 - `Optional<Boolean> interruptResponse`

~~2085~~

2086 Whether or not to automatically interrupt any ongoing response with output to the default

2087 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.

~~2088~~

2089 - `Optional<Output> output`

~~2090~~

2091 - `Optional<RealtimeAudioFormats> format`

~~2092~~

2093 The format of the output audio.

~~2094~~

2095 - `Optional<Double> speed`

~~2096~~

2097 The speed of the model's spoken response as a multiple of the original speed.

2098 1.0 is the default speed. 0.25 is the minimum speed. 1.5 is the maximum speed. This value can only be changed in between model turns, not while a response is in progress.

~~2099~~

2100 This parameter is a post-processing adjustment to the audio after it is generated, it's

2101 also possible to prompt the model to speak faster or slower.

~~2102~~

2103 - `Optional<Voice> voice`

~~2104~~

2105 The voice the model uses to respond. Voice cannot be changed during the

2106 session once the model has responded with audio at least once. Current

2107 voice options are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`,

2108 `shimmer`, `verse`, `marin`, and `cedar`. We recommend `marin` and `cedar` for

2109 best quality.

~~2110~~

2111 - `ALLOY("alloy")`

~~2112~~

2113 - `ASH("ash")`

~~2114~~

2115 - `BALLAD("ballad")`

~~2116~~

2117 - `CORAL("coral")`

~~2118~~

2119 - `ECHO("echo")`

~~2120~~

2121 - `SAGE("sage")`

~~2122~~

2123 - `SHIMMER("shimmer")`

~~2124~~

2125 - `VERSE("verse")`

~~2126~~

2127 - `MARIN("marin")`

~~2128~~

2129 - `CEDAR("cedar")`

~~2130~~

2131 - `Optional<Long> expiresAt`

~~2132~~

2133 Expiration timestamp for the session, in seconds since epoch.

~~2134~~

2135 - `Optional<List<Include>> include`

~~2136~~

2137 Additional fields to include in server outputs.

~~2138~~

2139 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~2140~~

2141 - `ITEM_INPUT_AUDIO_TRANSCRIPTION_LOGPROBS("item.input_audio_transcription.logprobs")`

~~2142~~

2143 - `Optional<String> instructions`

~~2144~~

2145 The default system instructions (i.e. system message) prepended to model calls. This field allows the client to guide the model on desired responses. The model can be instructed on response content and format, (e.g. "be extremely succinct", "act friendly", "here are examples of good responses") and on audio behavior (e.g. "talk quickly", "inject emotion into your voice", "laugh frequently"). The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behavior.

~~2146~~

2147 Note that the server sets default instructions which will be used if this field is not set and are visible in the `session.created` event at the start of the session.

~~2148~~

2149 - `Optional<MaxOutputTokens> maxOutputTokens`

~~2150~~

2151 Maximum number of output tokens for a single assistant response,

2152 inclusive of tool calls. Provide an integer between 1 and 4096 to

2153 limit output tokens, or `inf` for the maximum available tokens for a

2154 given model. Defaults to `inf`.

~~2155~~

2156 - `long`

~~2157~~

2158 - `JsonValue;`

~~2159~~

2160 - `INF("inf")`

~~2161~~

2162 - `Optional<Model> model`

~~2163~~

2164 The Realtime model used for this session.

~~2165~~

2166 - `GPT_REALTIME("gpt-realtime")`

~~2167~~

2168 - `GPT_REALTIME_1_5("gpt-realtime-1.5")`

~~2169~~

2170 - `GPT_REALTIME_2("gpt-realtime-2")`

~~2171~~

2172 - `GPT_REALTIME_2025_08_28("gpt-realtime-2025-08-28")`

~~2173~~

2174 - `GPT_4O_REALTIME_PREVIEW("gpt-4o-realtime-preview")`

~~2175~~

2176 - `GPT_4O_REALTIME_PREVIEW_2024_10_01("gpt-4o-realtime-preview-2024-10-01")`

~~2177~~

2178 - `GPT_4O_REALTIME_PREVIEW_2024_12_17("gpt-4o-realtime-preview-2024-12-17")`

~~2179~~

2180 - `GPT_4O_REALTIME_PREVIEW_2025_06_03("gpt-4o-realtime-preview-2025-06-03")`

~~2181~~

2182 - `GPT_4O_MINI_REALTIME_PREVIEW("gpt-4o-mini-realtime-preview")`

~~2183~~

2184 - `GPT_4O_MINI_REALTIME_PREVIEW_2024_12_17("gpt-4o-mini-realtime-preview-2024-12-17")`

~~2185~~

2186 - `GPT_REALTIME_MINI("gpt-realtime-mini")`

~~2187~~

2188 - `GPT_REALTIME_MINI_2025_10_06("gpt-realtime-mini-2025-10-06")`

~~2189~~

2190 - `GPT_REALTIME_MINI_2025_12_15("gpt-realtime-mini-2025-12-15")`

~~2191~~

2192 - `GPT_AUDIO_1_5("gpt-audio-1.5")`

~~2193~~

2194 - `GPT_AUDIO_MINI("gpt-audio-mini")`

~~2195~~

2196 - `GPT_AUDIO_MINI_2025_10_06("gpt-audio-mini-2025-10-06")`

~~2197~~

2198 - `GPT_AUDIO_MINI_2025_12_15("gpt-audio-mini-2025-12-15")`

~~2199~~

2200 - `Optional<List<OutputModality>> outputModalities`

~~2201~~

2202 The set of modalities the model can respond with. It defaults to `["audio"]`, indicating

2203 that the model will respond with audio plus a transcript. `["text"]` can be used to make

2204 the model respond with text only. It is not possible to request both `text` and `audio` at the same time.

~~2205~~

2206 - `TEXT("text")`

~~2207~~

2208 - `AUDIO("audio")`

~~2209~~

2210 - `Optional<ResponsePrompt> prompt`

~~2211~~

2212 Reference to a prompt template and its variables.

2213 [Learn more](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts).

~~2214~~

2215 - `String id`

~~2216~~

2217 The unique identifier of the prompt template to use.

~~2218~~

2219 - `Optional<Variables> variables`

~~2220~~

2221 Optional map of values to substitute in for variables in your

2222 prompt. The substitution values can either be strings, or other

2223 Response input types like images or files.

~~2224~~

2225 - `String`

~~2226~~

2227 - `class ResponseInputText:`

~~2228~~

2229 A text input to the model.

~~2230~~

2231 - `String text`

~~2232~~

2233 The text input to the model.

~~2234~~

2235 - `JsonValue; type "input_text"constant`

~~2236~~

2237 The type of the input item. Always `input_text`.

~~2238~~

2239 - `INPUT_TEXT("input_text")`

~~2240~~

2241 - `class ResponseInputImage:`

~~2242~~

2243 An image input to the model. Learn about [image inputs](https://platform.openai.com/docs/guides/vision).

~~2244~~

2245 - `Detail detail`

~~2246~~

2247 The detail level of the image to be sent to the model. One of `high`, `low`, `auto`, or `original`. Defaults to `auto`.

~~2248~~

2249 - `LOW("low")`

~~2250~~

2251 - `HIGH("high")`

~~2252~~

2253 - `AUTO("auto")`

~~2254~~

2255 - `ORIGINAL("original")`

~~2256~~

2257 - `JsonValue; type "input_image"constant`

~~2258~~

2259 The type of the input item. Always `input_image`.

~~2260~~

2261 - `INPUT_IMAGE("input_image")`

~~2262~~

2263 - `Optional<String> fileId`

~~2264~~

2265 The ID of the file to be sent to the model.

~~2266~~

2267 - `Optional<String> imageUrl`

~~2268~~

2269 The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.

~~2270~~

2271 - `class ResponseInputFile:`

~~2272~~

2273 A file input to the model.

~~2274~~

2275 - `JsonValue; type "input_file"constant`

~~2276~~

2277 The type of the input item. Always `input_file`.

~~2278~~

2279 - `INPUT_FILE("input_file")`

~~2280~~

2281 - `Optional<Detail> detail`

~~2282~~

2283 The detail level of the file to be sent to the model. Use `low` for the default rendering behavior, or `high` to render the file at higher quality. Defaults to `low`.

~~2284~~

2285 - `LOW("low")`

~~2286~~

2287 - `HIGH("high")`

~~2288~~

2289 - `Optional<String> fileData`

~~2290~~

2291 The content of the file to be sent to the model.

~~2292~~

2293 - `Optional<String> fileId`

~~2294~~

2295 The ID of the file to be sent to the model.

~~2296~~

2297 - `Optional<String> fileUrl`

~~2298~~

2299 The URL of the file to be sent to the model.

~~2300~~

2301 - `Optional<String> filename`

~~2302~~

2303 The name of the file to be sent to the model.

~~2304~~

2305 - `Optional<String> version`

~~2306~~

2307 Optional version of the prompt template.

~~2308~~

2309 - `Optional<RealtimeReasoning> reasoning`

~~2310~~

2311 Configuration for reasoning-capable Realtime models such as `gpt-realtime-2`.

~~2312~~

2313 - `Optional<RealtimeReasoningEffort> effort`

~~2314~~

2315 Constrains effort on reasoning for reasoning-capable Realtime models such as

2316 `gpt-realtime-2`.

~~2317~~

2318 - `MINIMAL("minimal")`

~~2319~~

2320 - `LOW("low")`

~~2321~~

2322 - `MEDIUM("medium")`

~~2323~~

2324 - `HIGH("high")`

~~2325~~

2326 - `XHIGH("xhigh")`

~~2327~~

2328 - `Optional<ToolChoice> toolChoice`

~~2329~~

2330 How the model chooses tools. Provide one of the string modes or force a specific

2331 function/MCP tool.

~~2332~~

2333 - `enum ToolChoiceOptions:`

~~2334~~

2335 Controls which (if any) tool is called by the model.

~~2336~~

2337 `none` means the model will not call any tool and instead generates a message.

~~2338~~

2339 `auto` means the model can pick between generating a message or calling one or

2340 more tools.

~~2341~~

2342 `required` means the model must call one or more tools.

~~2343~~

2344 - `NONE("none")`

~~2345~~

2346 - `AUTO("auto")`

~~2347~~

2348 - `REQUIRED("required")`

~~2349~~

2350 - `class ToolChoiceFunction:`

~~2351~~

2352 Use this option to force the model to call a specific function.

~~2353~~

2354 - `String name`

~~2355~~

2356 The name of the function to call.

~~2357~~

2358 - `JsonValue; type "function"constant`

~~2359~~

2360 For function calling, the type is always `function`.

~~2361~~

2362 - `FUNCTION("function")`

~~2363~~

2364 - `class ToolChoiceMcp:`

~~2365~~

2366 Use this option to force the model to call a specific tool on a remote MCP server.

~~2367~~

2368 - `String serverLabel`

~~2369~~

2370 The label of the MCP server to use.

~~2371~~

2372 - `JsonValue; type "mcp"constant`

~~2373~~

2374 For MCP tools, the type is always `mcp`.

~~2375~~

2376 - `MCP("mcp")`

~~2377~~

2378 - `Optional<String> name`

~~2379~~

2380 The name of the tool to call on the server.

~~2381~~

2382 - `Optional<List<Tool>> tools`

~~2383~~

2384 Tools available to the model.

~~2385~~

2386 - `class RealtimeFunctionTool:`

~~2387~~

2388 - `Optional<String> description`

~~2389~~

2390 The description of the function, including guidance on when and how

2391 to call it, and guidance about what to tell the user when calling

2392 (if anything).

~~2393~~

2394 - `Optional<String> name`

~~2395~~

2396 The name of the function.

~~2397~~

2398 - `Optional<JsonValue> parameters`

~~2399~~

2400 Parameters of the function in JSON Schema.

~~2401~~

2402 - `Optional<Type> type`

~~2403~~

2404 The type of the tool, i.e. `function`.

~~2405~~

2406 - `FUNCTION("function")`

~~2407~~

2408 - `class McpTool:`

~~2409~~

2410 Give the model access to additional tools via remote Model Context Protocol

2411 (MCP) servers. [Learn more about MCP](https://platform.openai.com/docs/guides/tools-remote-mcp).

~~2412~~

2413 - `String serverLabel`

~~2414~~

2415 A label for this MCP server, used to identify it in tool calls.

~~2416~~

2417 - `JsonValue; type "mcp"constant`

~~2418~~

2419 The type of the MCP tool. Always `mcp`.

~~2420~~

2421 - `MCP("mcp")`

~~2422~~

2423 - `Optional<AllowedTools> allowedTools`

~~2424~~

2425 List of allowed tool names or a filter object.

~~2426~~

2427 - `List<String>`

~~2428~~

2429 - `class McpToolFilter:`

~~2430~~

2431 A filter object to specify which tools are allowed.

~~2432~~

2433 - `Optional<Boolean> readOnly`

~~2434~~

2435 Indicates whether or not a tool modifies data or is read-only. If an

2436 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

2437 it will match this filter.

~~2438~~

2439 - `Optional<List<String>> toolNames`

~~2440~~

2441 List of allowed tool names.

~~2442~~

2443 - `Optional<String> authorization`

~~2444~~

2445 An OAuth access token that can be used with a remote MCP server, either

2446 with a custom MCP server URL or a service connector. Your application

2447 must handle the OAuth authorization flow and provide the token here.

~~2448~~

2449 - `Optional<ConnectorId> connectorId`

~~2450~~

2451 Identifier for service connectors, like those available in ChatGPT. One of

2452 `server_url` or `connector_id` must be provided. Learn more about service

2453 connectors [here](https://platform.openai.com/docs/guides/tools-remote-mcp#connectors).

~~2454~~

2455 Currently supported `connector_id` values are:

~~2456~~

2457 - Dropbox: `connector_dropbox`

2458 - Gmail: `connector_gmail`

2459 - Google Calendar: `connector_googlecalendar`

2460 - Google Drive: `connector_googledrive`

2461 - Microsoft Teams: `connector_microsoftteams`

2462 - Outlook Calendar: `connector_outlookcalendar`

2463 - Outlook Email: `connector_outlookemail`

2464 - SharePoint: `connector_sharepoint`

~~2465~~

2466 - `CONNECTOR_DROPBOX("connector_dropbox")`

~~2467~~

2468 - `CONNECTOR_GMAIL("connector_gmail")`

~~2469~~

2470 - `CONNECTOR_GOOGLECALENDAR("connector_googlecalendar")`

~~2471~~

2472 - `CONNECTOR_GOOGLEDRIVE("connector_googledrive")`

~~2473~~

2474 - `CONNECTOR_MICROSOFTTEAMS("connector_microsoftteams")`

~~2475~~

2476 - `CONNECTOR_OUTLOOKCALENDAR("connector_outlookcalendar")`

~~2477~~

2478 - `CONNECTOR_OUTLOOKEMAIL("connector_outlookemail")`

~~2479~~

2480 - `CONNECTOR_SHAREPOINT("connector_sharepoint")`

~~2481~~

2482 - `Optional<Boolean> deferLoading`

~~2483~~

2484 Whether this MCP tool is deferred and discovered via tool search.

~~2485~~

2486 - `Optional<Headers> headers`

~~2487~~

2488 Optional HTTP headers to send to the MCP server. Use for authentication

2489 or other purposes.

~~2490~~

2491 - `Optional<RequireApproval> requireApproval`

~~2492~~

2493 Specify which of the MCP server's tools require approval.

~~2494~~

2495 - `class McpToolApprovalFilter:`

~~2496~~

2497 Specify which of the MCP server's tools require approval. Can be

2498 `always`, `never`, or a filter object associated with tools

2499 that require approval.

~~2500~~

2501 - `Optional<Always> always`

~~2502~~

2503 A filter object to specify which tools are allowed.

~~2504~~

2505 - `Optional<Boolean> readOnly`

~~2506~~

2507 Indicates whether or not a tool modifies data or is read-only. If an

2508 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

2509 it will match this filter.

~~2510~~

2511 - `Optional<List<String>> toolNames`

~~2512~~

2513 List of allowed tool names.

~~2514~~

2515 - `Optional<Never> never`

~~2516~~

2517 A filter object to specify which tools are allowed.

~~2518~~

2519 - `Optional<Boolean> readOnly`

~~2520~~

2521 Indicates whether or not a tool modifies data or is read-only. If an

2522 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

2523 it will match this filter.

~~2524~~

2525 - `Optional<List<String>> toolNames`

~~2526~~

2527 List of allowed tool names.

~~2528~~

2529 - `enum McpToolApprovalSetting:`

~~2530~~

2531 Specify a single approval policy for all tools. One of `always` or

2532 `never`. When set to `always`, all tools will require approval. When

2533 set to `never`, all tools will not require approval.

~~2534~~

2535 - `ALWAYS("always")`

~~2536~~

2537 - `NEVER("never")`

~~2538~~

2539 - `Optional<String> serverDescription`

~~2540~~

2541 Optional description of the MCP server, used to provide more context.

~~2542~~

2543 - `Optional<String> serverUrl`

~~2544~~

2545 The URL for the MCP server. One of `server_url` or `connector_id` must be

2546 provided.

~~2547~~

2548 - `Optional<Tracing> tracing`

~~2549~~

2550 Realtime API can write session traces to the [Traces Dashboard](https://platform.openai.com/logs?api=traces). Set to null to disable tracing. Once

2551 tracing is enabled for a session, the configuration cannot be modified.

~~2552~~

2553 `auto` will create a trace for the session with default values for the

2554 workflow name, group id, and metadata.

~~2555~~

2556 - `JsonValue;`

~~2557~~

2558 - `AUTO("auto")`

~~2559~~

2560 - `class TracingConfiguration:`

~~2561~~

2562 Granular configuration for tracing.

~~2563~~

2564 - `Optional<String> groupId`

~~2565~~

2566 The group id to attach to this trace to enable filtering and

2567 grouping in the Traces Dashboard.

~~2568~~

2569 - `Optional<JsonValue> metadata`

~~2570~~

2571 The arbitrary metadata to attach to this trace to enable

2572 filtering in the Traces Dashboard.

~~2573~~

2574 - `Optional<String> workflowName`

~~2575~~

2576 The name of the workflow to attach to this trace. This is used to

2577 name the trace in the Traces Dashboard.

~~2578~~

2579 - `Optional<RealtimeTruncation> truncation`

~~2580~~

2581 When the number of tokens in a conversation exceeds the model's input token limit, the conversation be truncated, meaning messages (starting from the oldest) will not be included in the model's context. A 32k context model with 4,096 max output tokens can only include 28,224 tokens in the context before truncation occurs.

~~2582~~

2583 Clients can configure truncation behavior to truncate with a lower max token limit, which is an effective way to control token usage and cost.

~~2584~~

2585 Truncation will reduce the number of cached tokens on the next turn (busting the cache), since messages are dropped from the beginning of the context. However, clients can also configure truncation to retain messages up to a fraction of the maximum context size, which will reduce the need for future truncations and thus improve the cache rate.

~~2586~~

2587 Truncation can be disabled entirely, which means the server will never truncate but would instead return an error if the conversation exceeds the model's input token limit.

~~2588~~

2589 - `RealtimeTruncationStrategy`

~~2590~~

2591 - `AUTO("auto")`

~~2592~~

2593 - `DISABLED("disabled")`

~~2594~~

2595 - `class RealtimeTruncationRetentionRatio:`

~~2596~~

2597 Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.

~~2598~~

2599 - `double retentionRatio`

~~2600~~

2601 Fraction of post-instruction conversation tokens to retain (`0.0` - `1.0`) when the conversation exceeds the input token limit. Setting this to `0.8` means that messages will be dropped until 80% of the maximum allowed tokens are used. This helps reduce the frequency of truncations and improve cache rates.

~~2602~~

2603 - `JsonValue; type "retention_ratio"constant`

~~2604~~

2605 Use retention ratio truncation.

~~2606~~

2607 - `RETENTION_RATIO("retention_ratio")`

~~2608~~

2609 - `Optional<TokenLimits> tokenLimits`

~~2610~~

2611 Optional custom token limits for this truncation strategy. If not provided, the model's default token limits will be used.

~~2612~~

2613 - `Optional<Long> postInstructions`

~~2614~~

2615 Maximum tokens allowed in the conversation after instructions (which including tool definitions). For example, setting this to 5,000 would mean that truncation would occur when the conversation exceeds 5,000 tokens after instructions. This cannot be higher than the model's context window size minus the maximum output tokens.

~~2616~~

2617### Realtime Transcription Session Create Response

~~2618~~

2619- `class RealtimeTranscriptionSessionCreateResponse:`

~~2620~~

2621 A Realtime transcription session configuration object.

~~2622~~

2623 - `String id`

~~2624~~

2625 Unique identifier for the session that looks like `sess_1234567890abcdef`.

~~2626~~

2627 - `String object_`

~~2628~~

2629 The object type. Always `realtime.transcription_session`.

~~2630~~

2631 - `JsonValue; type "transcription"constant`

~~2632~~

2633 The type of session. Always `transcription` for transcription sessions.

~~2634~~

2635 - `TRANSCRIPTION("transcription")`

~~2636~~

2637 - `Optional<Audio> audio`

~~2638~~

2639 Configuration for input audio for the session.

~~2640~~

2641 - `Optional<Input> input`

~~2642~~

2643 - `Optional<RealtimeAudioFormats> format`

~~2644~~

2645 The PCM audio format. Only a 24kHz sample rate is supported.

~~2646~~

2647 - `AudioPcm`

~~2648~~

2649 - `Optional<Rate> rate`

~~2650~~

2651 The sample rate of the audio. Always `24000`.

~~2652~~

2653 - `_24000(24000)`

~~2654~~

2655 - `Optional<Type> type`

~~2656~~

2657 The audio format. Always `audio/pcm`.

~~2658~~

2659 - `AUDIO_PCM("audio/pcm")`

~~2660~~

2661 - `AudioPcmu`

~~2662~~

2663 - `Optional<Type> type`

~~2664~~

2665 The audio format. Always `audio/pcmu`.

~~2666~~

2667 - `AUDIO_PCMU("audio/pcmu")`

~~2668~~

2669 - `AudioPcma`

~~2670~~

2671 - `Optional<Type> type`

~~2672~~

2673 The audio format. Always `audio/pcma`.

~~2674~~

2675 - `AUDIO_PCMA("audio/pcma")`

~~2676~~

2677 - `Optional<NoiseReduction> noiseReduction`

~~2678~~

2679 Configuration for input audio noise reduction.

~~2680~~

2681 - `Optional<NoiseReductionType> type`

~~2682~~

2683 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.

~~2684~~

2685 - `NEAR_FIELD("near_field")`

~~2686~~

2687 - `FAR_FIELD("far_field")`

~~2688~~

2689 - `Optional<AudioTranscription> transcription`

~~2690~~

2691 - `Optional<Delay> delay`

~~2692~~

2693 Controls how long the model waits before emitting transcription text.

2694 Higher values can improve transcription accuracy at the cost of latency.

2695 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~2696~~

2697 - `MINIMAL("minimal")`

~~2698~~

2699 - `LOW("low")`

~~2700~~

2701 - `MEDIUM("medium")`

~~2702~~

2703 - `HIGH("high")`

~~2704~~

2705 - `XHIGH("xhigh")`

~~2706~~

2707 - `Optional<String> language`

~~2708~~

2709 The language of the input audio. Supplying the input language in

2710 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format

2711 will improve accuracy and latency.

~~2712~~

2713 - `Optional<Model> model`

~~2714~~

2715 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~2716~~

2717 - `WHISPER_1("whisper-1")`

~~2718~~

2719 - `GPT_4O_MINI_TRANSCRIBE("gpt-4o-mini-transcribe")`

~~2720~~

2721 - `GPT_4O_MINI_TRANSCRIBE_2025_12_15("gpt-4o-mini-transcribe-2025-12-15")`

~~2722~~

2723 - `GPT_4O_TRANSCRIBE("gpt-4o-transcribe")`

~~2724~~

2725 - `GPT_4O_TRANSCRIBE_DIARIZE("gpt-4o-transcribe-diarize")`

~~2726~~

2727 - `GPT_REALTIME_WHISPER("gpt-realtime-whisper")`

~~2728~~

2729 - `Optional<String> prompt`

~~2730~~

2731 An optional text to guide the model's style or continue a previous audio

2732 segment.

2733 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).

2734 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".

2735 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~2736~~

2737 - `Optional<RealtimeTranscriptionSessionTurnDetection> turnDetection`

~~2738~~

2739 Configuration for turn detection. Can be set to `null` to turn off. Server

2740 VAD means that the model will detect the start and end of speech based on

2741 audio volume and respond at the end of user speech. For `gpt-realtime-whisper`, this must be `null`; VAD is not supported.

~~2742~~

2743 - `Optional<Long> prefixPaddingMs`

~~2744~~

2745 Amount of audio to include before the VAD detected speech (in

2746 milliseconds). Defaults to 300ms.

~~2747~~

2748 - `Optional<Long> silenceDurationMs`

~~2749~~

2750 Duration of silence to detect speech stop (in milliseconds). Defaults

2751 to 500ms. With shorter values the model will respond more quickly,

2752 but may jump in on short pauses from the user.

~~2753~~

2754 - `Optional<Double> threshold`

~~2755~~

2756 Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

2757 higher threshold will require louder audio to activate the model, and

2758 thus might perform better in noisy environments.

~~2759~~

2760 - `Optional<String> type`

~~2761~~

2762 Type of turn detection, only `server_vad` is currently supported.

~~2763~~

2764 - `Optional<Long> expiresAt`

~~2765~~

2766 Expiration timestamp for the session, in seconds since epoch.

~~2767~~

2768 - `Optional<List<Include>> include`

~~2769~~

2770 Additional fields to include in server outputs.

~~2771~~

2772 - `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~2773~~

2774 - `ITEM_INPUT_AUDIO_TRANSCRIPTION_LOGPROBS("item.input_audio_transcription.logprobs")`

~~2775~~

2776### Realtime Transcription Session Turn Detection

~~2777~~

2778- `class RealtimeTranscriptionSessionTurnDetection:`

~~2779~~

2780 Configuration for turn detection. Can be set to `null` to turn off. Server

2781 VAD means that the model will detect the start and end of speech based on

2782 audio volume and respond at the end of user speech. For `gpt-realtime-whisper`, this must be `null`; VAD is not supported.

~~2783~~

2784 - `Optional<Long> prefixPaddingMs`

~~2785~~

2786 Amount of audio to include before the VAD detected speech (in

2787 milliseconds). Defaults to 300ms.

~~2788~~

2789 - `Optional<Long> silenceDurationMs`

~~2790~~

2791 Duration of silence to detect speech stop (in milliseconds). Defaults

2792 to 500ms. With shorter values the model will respond more quickly,

2793 but may jump in on short pauses from the user.

~~2794~~

2795 - `Optional<Double> threshold`

~~2796~~

2797 Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

2798 higher threshold will require louder audio to activate the model, and

2799 thus might perform better in noisy environments.

~~2800~~

2801 - `Optional<String> type`

~~2802~~

2803 Type of turn detection, only `server_vad` is currently supported.