Go Premium Account

Spybara
Companies
Openai
Api
Reference Changes, 2026-05-18 22:01 UTC to 2026-05-19 06:34 UTC
ruby/resources/realtime/subresources/client_secrets/index.md

ruby/resources/realtime/subresources/client_secrets/index.md 2026-05-18 22:01 UTC to 2026-05-19 06:34 UTC

0 added, 3784 removed.

2026

Wed 27 06:42 Fri 22 06:33 Wed 20 06:35 Tue 19 06:34 Mon 18 22:01 Mon 11 18:00 Thu 7 21:57 Tue 5 23:00 Sat 2 05:57

This document has no rendered page for this history range.

ruby/resources/realtime/subresources/client_secrets/index.md +0 −3784 deleted

File Deleted View Diff

~~1# Client Secrets~~

~~3## Create client secret~~

~~5`realtime.client_secrets.create(**kwargs) -> ClientSecretCreateResponse`~~

~~7**post** `/realtime/client_secrets`~~

~~9Create a Realtime client secret with an associated session configuration.~~

~~11Client secrets are short-lived tokens that can be passed to a client app,~~

~~12such as a web frontend or mobile client, which grants access to the Realtime API without~~

~~13leaking your main API key. You can configure a custom TTL for each client secret.~~

~~15You can also attach session configuration options to the client secret, which will be~~

~~16applied to any sessions created using that client secret, but these can also be overridden~~

~~17by the client connection.~~

~~19[Learn more about authentication with client secrets over WebRTC](https://platform.openai.com/docs/guides/realtime-webrtc).~~

~~21Returns the created client secret and the effective session object. The client secret is a string that looks like `ek_1234`.~~

~~23### Parameters~~

~~25- `expires_after: ExpiresAfter{ anchor, seconds}`~~

~~27 Configuration for the client secret expiration. Expiration refers to the time after which~~

~~28 a client secret will no longer be valid for creating sessions. The session itself may~~

~~29 continue after that time once started. A secret can be used to create multiple sessions~~

~~30 until it expires.~~

~~32 - `anchor: :created_at`~~

34 The anchor point for the client secret expiration, meaning that `seconds` will be added to the `created_at` time of the client secret to produce an expiration timestamp. Only `created_at` is currently supported.

~~36 - `:created_at`~~

~~38 - `seconds: Integer`~~

~~40 The number of seconds from the anchor point to the expiration. Select a value between `10` and `7200` (2 hours). This default to 600 seconds (10 minutes) if not specified.~~

~~42- `session: RealtimeSessionCreateRequest | RealtimeTranscriptionSessionCreateRequest`~~

~~44 Session configuration to use for the client secret. Choose either a realtime~~

~~45 session or a transcription session.~~

~~47 - `class RealtimeSessionCreateRequest`~~

~~49 Realtime session object configuration.~~

~~51 - `type: :realtime`~~

~~53 The type of session to create. Always `realtime` for the Realtime API.~~

~~55 - `:realtime`~~

~~57 - `audio: RealtimeAudioConfig`~~

~~59 Configuration for input and output audio.~~

~~61 - `input: RealtimeAudioConfigInput`~~

~~63 - `format_: RealtimeAudioFormats`~~

~~65 The format of the input audio.~~

~~67 - `class AudioPCM`~~

~~69 The PCM audio format. Only a 24kHz sample rate is supported.~~

~~71 - `rate: 24000`~~

~~73 The sample rate of the audio. Always `24000`.~~

~~75 - `24000`~~

~~77 - `type: :"audio/pcm"`~~

~~79 The audio format. Always `audio/pcm`.~~

~~81 - `:"audio/pcm"`~~

~~83 - `class AudioPCMU`~~

~~85 The G.711 μ-law format.~~

~~87 - `type: :"audio/pcmu"`~~

~~89 The audio format. Always `audio/pcmu`.~~

~~91 - `:"audio/pcmu"`~~

~~93 - `class AudioPCMA`~~

~~95 The G.711 A-law format.~~

~~97 - `type: :"audio/pcma"`~~

~~99 The audio format. Always `audio/pcma`.~~

~~100~~

101 - `:"audio/pcma"`

~~102~~

103 - `noise_reduction: NoiseReduction{ type}`

~~104~~

105 Configuration for input audio noise reduction. This can be set to `null` to turn off.

106 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.

107 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.

~~108~~

109 - `type: NoiseReductionType`

~~110~~

111 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.

~~112~~

113 - `:near_field`

~~114~~

115 - `:far_field`

~~116~~

117 - `transcription: AudioTranscription`

~~118~~

119 Configuration for input audio transcription, defaults to off and can be set to `null` to turn off once on. Input audio transcription is not native to the model, since the model consumes audio directly. Transcription runs asynchronously through [the /audio/transcriptions endpoint](https://platform.openai.com/docs/api-reference/audio/createTranscription) and should be treated as guidance of input audio content rather than precisely what the model heard. The client can optionally set the language and prompt for transcription, these offer additional guidance to the transcription service.

~~120~~

121 - `delay: :minimal | :low | :medium | 2 more`

~~122~~

123 Controls how long the model waits before emitting transcription text.

124 Higher values can improve transcription accuracy at the cost of latency.

125 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~126~~

127 - `:minimal`

~~128~~

129 - `:low`

~~130~~

131 - `:medium`

~~132~~

133 - `:high`

~~134~~

135 - `:xhigh`

~~136~~

137 - `language: String`

~~138~~

139 The language of the input audio. Supplying the input language in

140 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format

141 will improve accuracy and latency.

~~142~~

143 - `model: String | :"whisper-1" | :"gpt-4o-mini-transcribe" | :"gpt-4o-mini-transcribe-2025-12-15" | 3 more`

~~144~~

145 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~146~~

147 - `String = String`

~~148~~

149 - `Model = :"whisper-1" | :"gpt-4o-mini-transcribe" | :"gpt-4o-mini-transcribe-2025-12-15" | 3 more`

~~150~~

151 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~152~~

153 - `:"whisper-1"`

~~154~~

155 - `:"gpt-4o-mini-transcribe"`

~~156~~

157 - `:"gpt-4o-mini-transcribe-2025-12-15"`

~~158~~

159 - `:"gpt-4o-transcribe"`

~~160~~

161 - `:"gpt-4o-transcribe-diarize"`

~~162~~

163 - `:"gpt-realtime-whisper"`

~~164~~

165 - `prompt: String`

~~166~~

167 An optional text to guide the model's style or continue a previous audio

168 segment.

169 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).

170 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".

171 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~172~~

173 - `turn_detection: RealtimeAudioInputTurnDetection`

~~174~~

175 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.

~~176~~

177 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.

~~178~~

179 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.

~~180~~

181 For `gpt-realtime-whisper` transcription sessions, turn detection must be

182 set to `null`; VAD is not supported.

~~183~~

184 - `class ServerVad`

~~185~~

186 Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.

~~187~~

188 - `type: :server_vad`

~~189~~

190 Type of turn detection, `server_vad` to turn on simple Server VAD.

~~191~~

192 - `:server_vad`

~~193~~

194 - `create_response: bool`

~~195~~

196 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.

~~197~~

198 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~199~~

200 - `idle_timeout_ms: Integer`

~~201~~

202 Optional timeout after which a model response will be triggered automatically. This is

203 useful for situations in which a long pause from the user is unexpected, such as a phone

204 call. The model will effectively prompt the user to continue the conversation based

205 on the current context.

~~206~~

207 The timeout value will be applied after the last model response's audio has finished playing,

208 i.e. it's set to the `response.done` time plus audio playback duration.

~~209~~

210 An `input_audio_buffer.timeout_triggered` event (plus events

211 associated with the Response) will be emitted when the timeout is reached.

212 Idle timeout is currently only supported for `server_vad` mode.

~~213~~

214 - `interrupt_response: bool`

~~215~~

216 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default

217 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.

~~218~~

219 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~220~~

221 - `prefix_padding_ms: Integer`

~~222~~

223 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in

224 milliseconds). Defaults to 300ms.

~~225~~

226 - `silence_duration_ms: Integer`

~~227~~

228 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults

229 to 500ms. With shorter values the model will respond more quickly,

230 but may jump in on short pauses from the user.

~~231~~

232 - `threshold: Float`

~~233~~

234 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

235 higher threshold will require louder audio to activate the model, and

236 thus might perform better in noisy environments.

~~237~~

238 - `class SemanticVad`

~~239~~

240 Server-side semantic turn detection which uses a model to determine when the user has finished speaking.

~~241~~

242 - `type: :semantic_vad`

~~243~~

244 Type of turn detection, `semantic_vad` to turn on Semantic VAD.

~~245~~

246 - `:semantic_vad`

~~247~~

248 - `create_response: bool`

~~249~~

250 Whether or not to automatically generate a response when a VAD stop event occurs.

~~251~~

252 - `eagerness: :low | :medium | :high | :auto`

~~253~~

254 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.

~~255~~

256 - `:low`

~~257~~

258 - `:medium`

~~259~~

260 - `:high`

~~261~~

262 - `:auto`

~~263~~

264 - `interrupt_response: bool`

~~265~~

266 Whether or not to automatically interrupt any ongoing response with output to the default

267 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.

~~268~~

269 - `output: RealtimeAudioConfigOutput`

~~270~~

271 - `format_: RealtimeAudioFormats`

~~272~~

273 The format of the output audio.

~~274~~

275 - `speed: Float`

~~276~~

277 The speed of the model's spoken response as a multiple of the original speed.

278 1.0 is the default speed. 0.25 is the minimum speed. 1.5 is the maximum speed. This value can only be changed in between model turns, not while a response is in progress.

~~279~~

280 This parameter is a post-processing adjustment to the audio after it is generated, it's

281 also possible to prompt the model to speak faster or slower.

~~282~~

283 - `voice: String | :alloy | :ash | :ballad | 7 more | ID{ id}`

~~284~~

285 The voice the model uses to respond. Supported built-in voices are

286 `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`, `shimmer`, `verse`,

287 `marin`, and `cedar`. You may also provide a custom voice object with

288 an `id`, for example `{ "id": "voice_1234" }`. Voice cannot be changed

289 during the session once the model has responded with audio at least once.

290 We recommend `marin` and `cedar` for best quality.

~~291~~

292 - `String = String`

~~293~~

294 - `Voice = :alloy | :ash | :ballad | 7 more`

~~295~~

296 - `:alloy`

~~297~~

298 - `:ash`

~~299~~

300 - `:ballad`

~~301~~

302 - `:coral`

~~303~~

304 - `:echo`

~~305~~

306 - `:sage`

~~307~~

308 - `:shimmer`

~~309~~

310 - `:verse`

~~311~~

312 - `:marin`

~~313~~

314 - `:cedar`

~~315~~

316 - `class ID`

~~317~~

318 Custom voice reference.

~~319~~

320 - `id: String`

~~321~~

322 The custom voice ID, e.g. `voice_1234`.

~~323~~

324 - `include: Array[:"item.input_audio_transcription.logprobs"]`

~~325~~

326 Additional fields to include in server outputs.

~~327~~

328 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~329~~

330 - `:"item.input_audio_transcription.logprobs"`

~~331~~

332 - `instructions: String`

~~333~~

334 The default system instructions (i.e. system message) prepended to model calls. This field allows the client to guide the model on desired responses. The model can be instructed on response content and format, (e.g. "be extremely succinct", "act friendly", "here are examples of good responses") and on audio behavior (e.g. "talk quickly", "inject emotion into your voice", "laugh frequently"). The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behavior.

~~335~~

336 Note that the server sets default instructions which will be used if this field is not set and are visible in the `session.created` event at the start of the session.

~~337~~

338 - `max_output_tokens: Integer | :inf`

~~339~~

340 Maximum number of output tokens for a single assistant response,

341 inclusive of tool calls. Provide an integer between 1 and 4096 to

342 limit output tokens, or `inf` for the maximum available tokens for a

343 given model. Defaults to `inf`.

~~344~~

345 - `Integer = Integer`

~~346~~

347 - `MaxOutputTokens = :inf`

~~348~~

349 - `:inf`

~~350~~

351 - `model: String | :"gpt-realtime" | :"gpt-realtime-1.5" | :"gpt-realtime-2" | 14 more`

~~352~~

353 The Realtime model used for this session.

~~354~~

355 - `String = String`

~~356~~

357 - `Model = :"gpt-realtime" | :"gpt-realtime-1.5" | :"gpt-realtime-2" | 14 more`

~~358~~

359 The Realtime model used for this session.

~~360~~

361 - `:"gpt-realtime"`

~~362~~

363 - `:"gpt-realtime-1.5"`

~~364~~

365 - `:"gpt-realtime-2"`

~~366~~

367 - `:"gpt-realtime-2025-08-28"`

~~368~~

369 - `:"gpt-4o-realtime-preview"`

~~370~~

371 - `:"gpt-4o-realtime-preview-2024-10-01"`

~~372~~

373 - `:"gpt-4o-realtime-preview-2024-12-17"`

~~374~~

375 - `:"gpt-4o-realtime-preview-2025-06-03"`

~~376~~

377 - `:"gpt-4o-mini-realtime-preview"`

~~378~~

379 - `:"gpt-4o-mini-realtime-preview-2024-12-17"`

~~380~~

381 - `:"gpt-realtime-mini"`

~~382~~

383 - `:"gpt-realtime-mini-2025-10-06"`

~~384~~

385 - `:"gpt-realtime-mini-2025-12-15"`

~~386~~

387 - `:"gpt-audio-1.5"`

~~388~~

389 - `:"gpt-audio-mini"`

~~390~~

391 - `:"gpt-audio-mini-2025-10-06"`

~~392~~

393 - `:"gpt-audio-mini-2025-12-15"`

~~394~~

395 - `output_modalities: Array[:text | :audio]`

~~396~~

397 The set of modalities the model can respond with. It defaults to `["audio"]`, indicating

398 that the model will respond with audio plus a transcript. `["text"]` can be used to make

399 the model respond with text only. It is not possible to request both `text` and `audio` at the same time.

~~400~~

401 - `:text`

~~402~~

403 - `:audio`

~~404~~

405 - `parallel_tool_calls: bool`

~~406~~

407 Whether the model may call multiple tools in parallel. Only supported by

408 reasoning Realtime models such as `gpt-realtime-2`.

~~409~~

410 - `prompt: ResponsePrompt`

~~411~~

412 Reference to a prompt template and its variables.

413 [Learn more](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts).

~~414~~

415 - `id: String`

~~416~~

417 The unique identifier of the prompt template to use.

~~418~~

419 - `variables: Hash[Symbol, String | ResponseInputText | ResponseInputImage | ResponseInputFile]`

~~420~~

421 Optional map of values to substitute in for variables in your

422 prompt. The substitution values can either be strings, or other

423 Response input types like images or files.

~~424~~

425 - `String = String`

~~426~~

427 - `class ResponseInputText`

~~428~~

429 A text input to the model.

~~430~~

431 - `text: String`

~~432~~

433 The text input to the model.

~~434~~

435 - `type: :input_text`

~~436~~

437 The type of the input item. Always `input_text`.

~~438~~

439 - `:input_text`

~~440~~

441 - `class ResponseInputImage`

~~442~~

443 An image input to the model. Learn about [image inputs](https://platform.openai.com/docs/guides/vision).

~~444~~

445 - `detail: :low | :high | :auto | :original`

~~446~~

447 The detail level of the image to be sent to the model. One of `high`, `low`, `auto`, or `original`. Defaults to `auto`.

~~448~~

449 - `:low`

~~450~~

451 - `:high`

~~452~~

453 - `:auto`

~~454~~

455 - `:original`

~~456~~

457 - `type: :input_image`

~~458~~

459 The type of the input item. Always `input_image`.

~~460~~

461 - `:input_image`

~~462~~

463 - `file_id: String`

~~464~~

465 The ID of the file to be sent to the model.

~~466~~

467 - `image_url: String`

~~468~~

469 The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.

~~470~~

471 - `class ResponseInputFile`

~~472~~

473 A file input to the model.

~~474~~

475 - `type: :input_file`

~~476~~

477 The type of the input item. Always `input_file`.

~~478~~

479 - `:input_file`

~~480~~

481 - `detail: :low | :high`

~~482~~

483 The detail level of the file to be sent to the model. Use `low` for the default rendering behavior, or `high` to render the file at higher quality. Defaults to `low`.

~~484~~

485 - `:low`

~~486~~

487 - `:high`

~~488~~

489 - `file_data: String`

~~490~~

491 The content of the file to be sent to the model.

~~492~~

493 - `file_id: String`

~~494~~

495 The ID of the file to be sent to the model.

~~496~~

497 - `file_url: String`

~~498~~

499 The URL of the file to be sent to the model.

~~500~~

501 - `filename: String`

~~502~~

503 The name of the file to be sent to the model.

~~504~~

505 - `version: String`

~~506~~

507 Optional version of the prompt template.

~~508~~

509 - `reasoning: RealtimeReasoning`

~~510~~

511 Configuration for reasoning-capable Realtime models such as `gpt-realtime-2`.

~~512~~

513 - `effort: RealtimeReasoningEffort`

~~514~~

515 Constrains effort on reasoning for reasoning-capable Realtime models such as

516 `gpt-realtime-2`.

~~517~~

518 - `:minimal`

~~519~~

520 - `:low`

~~521~~

522 - `:medium`

~~523~~

524 - `:high`

~~525~~

526 - `:xhigh`

~~527~~

528 - `tool_choice: RealtimeToolChoiceConfig`

~~529~~

530 How the model chooses tools. Provide one of the string modes or force a specific

531 function/MCP tool.

~~532~~

533 - `ToolChoiceOptions = :none | :auto | :required`

~~534~~

535 Controls which (if any) tool is called by the model.

~~536~~

537 `none` means the model will not call any tool and instead generates a message.

~~538~~

539 `auto` means the model can pick between generating a message or calling one or

540 more tools.

~~541~~

542 `required` means the model must call one or more tools.

~~543~~

544 - `:none`

~~545~~

546 - `:auto`

~~547~~

548 - `:required`

~~549~~

550 - `class ToolChoiceFunction`

~~551~~

552 Use this option to force the model to call a specific function.

~~553~~

554 - `name: String`

~~555~~

556 The name of the function to call.

~~557~~

558 - `type: :function`

~~559~~

560 For function calling, the type is always `function`.

~~561~~

562 - `:function`

~~563~~

564 - `class ToolChoiceMcp`

~~565~~

566 Use this option to force the model to call a specific tool on a remote MCP server.

~~567~~

568 - `server_label: String`

~~569~~

570 The label of the MCP server to use.

~~571~~

572 - `type: :mcp`

~~573~~

574 For MCP tools, the type is always `mcp`.

~~575~~

576 - `:mcp`

~~577~~

578 - `name: String`

~~579~~

580 The name of the tool to call on the server.

~~581~~

582 - `tools: RealtimeToolsConfig`

~~583~~

584 Tools available to the model.

~~585~~

586 - `class RealtimeFunctionTool`

~~587~~

588 - `description: String`

~~589~~

590 The description of the function, including guidance on when and how

591 to call it, and guidance about what to tell the user when calling

592 (if anything).

~~593~~

594 - `name: String`

~~595~~

596 The name of the function.

~~597~~

598 - `parameters: untyped`

~~599~~

600 Parameters of the function in JSON Schema.

~~601~~

602 - `type: :function`

~~603~~

604 The type of the tool, i.e. `function`.

~~605~~

606 - `:function`

~~607~~

608 - `class Mcp`

~~609~~

610 Give the model access to additional tools via remote Model Context Protocol

611 (MCP) servers. [Learn more about MCP](https://platform.openai.com/docs/guides/tools-remote-mcp).

~~612~~

613 - `server_label: String`

~~614~~

615 A label for this MCP server, used to identify it in tool calls.

~~616~~

617 - `type: :mcp`

~~618~~

619 The type of the MCP tool. Always `mcp`.

~~620~~

621 - `:mcp`

~~622~~

623 - `allowed_tools: Array[String] | McpToolFilter{ read_only, tool_names}`

~~624~~

625 List of allowed tool names or a filter object.

~~626~~

627 - `McpAllowedTools = Array[String]`

~~628~~

629 A string array of allowed tool names

~~630~~

631 - `class McpToolFilter`

~~632~~

633 A filter object to specify which tools are allowed.

~~634~~

635 - `read_only: bool`

~~636~~

637 Indicates whether or not a tool modifies data or is read-only. If an

638 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

639 it will match this filter.

~~640~~

641 - `tool_names: Array[String]`

~~642~~

643 List of allowed tool names.

~~644~~

645 - `authorization: String`

~~646~~

647 An OAuth access token that can be used with a remote MCP server, either

648 with a custom MCP server URL or a service connector. Your application

649 must handle the OAuth authorization flow and provide the token here.

~~650~~

651 - `connector_id: :connector_dropbox | :connector_gmail | :connector_googlecalendar | 5 more`

~~652~~

653 Identifier for service connectors, like those available in ChatGPT. One of

654 `server_url` or `connector_id` must be provided. Learn more about service

655 connectors [here](https://platform.openai.com/docs/guides/tools-remote-mcp#connectors).

~~656~~

657 Currently supported `connector_id` values are:

~~658~~

659 - Dropbox: `connector_dropbox`

660 - Gmail: `connector_gmail`

661 - Google Calendar: `connector_googlecalendar`

662 - Google Drive: `connector_googledrive`

663 - Microsoft Teams: `connector_microsoftteams`

664 - Outlook Calendar: `connector_outlookcalendar`

665 - Outlook Email: `connector_outlookemail`

666 - SharePoint: `connector_sharepoint`

~~667~~

668 - `:connector_dropbox`

~~669~~

670 - `:connector_gmail`

~~671~~

672 - `:connector_googlecalendar`

~~673~~

674 - `:connector_googledrive`

~~675~~

676 - `:connector_microsoftteams`

~~677~~

678 - `:connector_outlookcalendar`

~~679~~

680 - `:connector_outlookemail`

~~681~~

682 - `:connector_sharepoint`

~~683~~

684 - `defer_loading: bool`

~~685~~

686 Whether this MCP tool is deferred and discovered via tool search.

~~687~~

688 - `headers: Hash[Symbol, String]`

~~689~~

690 Optional HTTP headers to send to the MCP server. Use for authentication

691 or other purposes.

~~692~~

693 - `require_approval: McpToolApprovalFilter{ always, never} | :always | :never`

~~694~~

695 Specify which of the MCP server's tools require approval.

~~696~~

697 - `class McpToolApprovalFilter`

~~698~~

699 Specify which of the MCP server's tools require approval. Can be

700 `always`, `never`, or a filter object associated with tools

701 that require approval.

~~702~~

703 - `always: Always{ read_only, tool_names}`

~~704~~

705 A filter object to specify which tools are allowed.

~~706~~

707 - `read_only: bool`

~~708~~

709 Indicates whether or not a tool modifies data or is read-only. If an

710 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

711 it will match this filter.

~~712~~

713 - `tool_names: Array[String]`

~~714~~

715 List of allowed tool names.

~~716~~

717 - `never: Never{ read_only, tool_names}`

~~718~~

719 A filter object to specify which tools are allowed.

~~720~~

721 - `read_only: bool`

~~722~~

723 Indicates whether or not a tool modifies data or is read-only. If an

724 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

725 it will match this filter.

~~726~~

727 - `tool_names: Array[String]`

~~728~~

729 List of allowed tool names.

~~730~~

731 - `McpToolApprovalSetting = :always | :never`

~~732~~

733 Specify a single approval policy for all tools. One of `always` or

734 `never`. When set to `always`, all tools will require approval. When

735 set to `never`, all tools will not require approval.

~~736~~

737 - `:always`

~~738~~

739 - `:never`

~~740~~

741 - `server_description: String`

~~742~~

743 Optional description of the MCP server, used to provide more context.

~~744~~

745 - `server_url: String`

~~746~~

747 The URL for the MCP server. One of `server_url` or `connector_id` must be

748 provided.

~~749~~

750 - `tracing: RealtimeTracingConfig`

~~751~~

752 Realtime API can write session traces to the [Traces Dashboard](https://platform.openai.com/logs?api=traces). Set to null to disable tracing. Once

753 tracing is enabled for a session, the configuration cannot be modified.

~~754~~

755 `auto` will create a trace for the session with default values for the

756 workflow name, group id, and metadata.

~~757~~

758 - `RealtimeTracingConfig = :auto`

~~759~~

760 Enables tracing and sets default values for tracing configuration options. Always `auto`.

~~761~~

762 - `:auto`

~~763~~

764 - `class TracingConfiguration`

~~765~~

766 Granular configuration for tracing.

~~767~~

768 - `group_id: String`

~~769~~

770 The group id to attach to this trace to enable filtering and

771 grouping in the Traces Dashboard.

~~772~~

773 - `metadata: untyped`

~~774~~

775 The arbitrary metadata to attach to this trace to enable

776 filtering in the Traces Dashboard.

~~777~~

778 - `workflow_name: String`

~~779~~

780 The name of the workflow to attach to this trace. This is used to

781 name the trace in the Traces Dashboard.

~~782~~

783 - `truncation: RealtimeTruncation`

~~784~~

785 When the number of tokens in a conversation exceeds the model's input token limit, the conversation be truncated, meaning messages (starting from the oldest) will not be included in the model's context. A 32k context model with 4,096 max output tokens can only include 28,224 tokens in the context before truncation occurs.

~~786~~

787 Clients can configure truncation behavior to truncate with a lower max token limit, which is an effective way to control token usage and cost.

~~788~~

789 Truncation will reduce the number of cached tokens on the next turn (busting the cache), since messages are dropped from the beginning of the context. However, clients can also configure truncation to retain messages up to a fraction of the maximum context size, which will reduce the need for future truncations and thus improve the cache rate.

~~790~~

791 Truncation can be disabled entirely, which means the server will never truncate but would instead return an error if the conversation exceeds the model's input token limit.

~~792~~

793 - `RealtimeTruncationStrategy = :auto | :disabled`

~~794~~

795 The truncation strategy to use for the session. `auto` is the default truncation strategy. `disabled` will disable truncation and emit errors when the conversation exceeds the input token limit.

~~796~~

797 - `:auto`

~~798~~

799 - `:disabled`

~~800~~

801 - `class RealtimeTruncationRetentionRatio`

~~802~~

803 Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.

~~804~~

805 - `retention_ratio: Float`

~~806~~

807 Fraction of post-instruction conversation tokens to retain (`0.0` - `1.0`) when the conversation exceeds the input token limit. Setting this to `0.8` means that messages will be dropped until 80% of the maximum allowed tokens are used. This helps reduce the frequency of truncations and improve cache rates.

~~808~~

809 - `type: :retention_ratio`

~~810~~

811 Use retention ratio truncation.

~~812~~

813 - `:retention_ratio`

~~814~~

815 - `token_limits: TokenLimits{ post_instructions}`

~~816~~

817 Optional custom token limits for this truncation strategy. If not provided, the model's default token limits will be used.

~~818~~

819 - `post_instructions: Integer`

~~820~~

821 Maximum tokens allowed in the conversation after instructions (which including tool definitions). For example, setting this to 5,000 would mean that truncation would occur when the conversation exceeds 5,000 tokens after instructions. This cannot be higher than the model's context window size minus the maximum output tokens.

~~822~~

823 - `class RealtimeTranscriptionSessionCreateRequest`

~~824~~

825 Realtime transcription session object configuration.

~~826~~

827 - `type: :transcription`

~~828~~

829 The type of session to create. Always `transcription` for transcription sessions.

~~830~~

831 - `:transcription`

~~832~~

833 - `audio: RealtimeTranscriptionSessionAudio`

~~834~~

835 Configuration for input and output audio.

~~836~~

837 - `input: RealtimeTranscriptionSessionAudioInput`

~~838~~

839 - `format_: RealtimeAudioFormats`

~~840~~

841 The PCM audio format. Only a 24kHz sample rate is supported.

~~842~~

843 - `noise_reduction: NoiseReduction{ type}`

~~844~~

845 Configuration for input audio noise reduction. This can be set to `null` to turn off.

846 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.

847 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.

~~848~~

849 - `type: NoiseReductionType`

~~850~~

851 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.

~~852~~

853 - `transcription: AudioTranscription`

~~854~~

855 Configuration for input audio transcription, defaults to off and can be set to `null` to turn off once on. Input audio transcription is not native to the model, since the model consumes audio directly. Transcription runs asynchronously through [the /audio/transcriptions endpoint](https://platform.openai.com/docs/api-reference/audio/createTranscription) and should be treated as guidance of input audio content rather than precisely what the model heard. The client can optionally set the language and prompt for transcription, these offer additional guidance to the transcription service.

~~856~~

857 - `turn_detection: RealtimeTranscriptionSessionAudioInputTurnDetection`

~~858~~

859 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.

~~860~~

861 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.

~~862~~

863 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.

~~864~~

865 For `gpt-realtime-whisper` transcription sessions, turn detection must be

866 set to `null`; VAD is not supported.

~~867~~

868 - `class ServerVad`

~~869~~

870 Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.

~~871~~

872 - `type: :server_vad`

~~873~~

874 Type of turn detection, `server_vad` to turn on simple Server VAD.

~~875~~

876 - `:server_vad`

~~877~~

878 - `create_response: bool`

~~879~~

880 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.

~~881~~

882 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~883~~

884 - `idle_timeout_ms: Integer`

~~885~~

886 Optional timeout after which a model response will be triggered automatically. This is

887 useful for situations in which a long pause from the user is unexpected, such as a phone

888 call. The model will effectively prompt the user to continue the conversation based

889 on the current context.

~~890~~

891 The timeout value will be applied after the last model response's audio has finished playing,

892 i.e. it's set to the `response.done` time plus audio playback duration.

~~893~~

894 An `input_audio_buffer.timeout_triggered` event (plus events

895 associated with the Response) will be emitted when the timeout is reached.

896 Idle timeout is currently only supported for `server_vad` mode.

~~897~~

898 - `interrupt_response: bool`

~~899~~

900 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default

901 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.

~~902~~

903 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~904~~

905 - `prefix_padding_ms: Integer`

~~906~~

907 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in

908 milliseconds). Defaults to 300ms.

~~909~~

910 - `silence_duration_ms: Integer`

~~911~~

912 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults

913 to 500ms. With shorter values the model will respond more quickly,

914 but may jump in on short pauses from the user.

~~915~~

916 - `threshold: Float`

~~917~~

918 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

919 higher threshold will require louder audio to activate the model, and

920 thus might perform better in noisy environments.

~~921~~

922 - `class SemanticVad`

~~923~~

924 Server-side semantic turn detection which uses a model to determine when the user has finished speaking.

~~925~~

926 - `type: :semantic_vad`

~~927~~

928 Type of turn detection, `semantic_vad` to turn on Semantic VAD.

~~929~~

930 - `:semantic_vad`

~~931~~

932 - `create_response: bool`

~~933~~

934 Whether or not to automatically generate a response when a VAD stop event occurs.

~~935~~

936 - `eagerness: :low | :medium | :high | :auto`

~~937~~

938 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.

~~939~~

940 - `:low`

~~941~~

942 - `:medium`

~~943~~

944 - `:high`

~~945~~

946 - `:auto`

~~947~~

948 - `interrupt_response: bool`

~~949~~

950 Whether or not to automatically interrupt any ongoing response with output to the default

951 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.

~~952~~

953 - `include: Array[:"item.input_audio_transcription.logprobs"]`

~~954~~

955 Additional fields to include in server outputs.

~~956~~

957 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~958~~

959 - `:"item.input_audio_transcription.logprobs"`

~~960~~

961### Returns

~~962~~

963- `class ClientSecretCreateResponse`

~~964~~

965 Response from creating a session and client secret for the Realtime API.

~~966~~

967 - `expires_at: Integer`

~~968~~

969 Expiration timestamp for the client secret, in seconds since epoch.

~~970~~

971 - `session: RealtimeSessionCreateResponse | RealtimeTranscriptionSessionCreateResponse`

~~972~~

973 The session configuration for either a realtime or transcription session.

~~974~~

975 - `class RealtimeSessionCreateResponse`

~~976~~

977 A Realtime session configuration object.

~~978~~

979 - `id: String`

~~980~~

981 Unique identifier for the session that looks like `sess_1234567890abcdef`.

~~982~~

983 - `object: :"realtime.session"`

~~984~~

985 The object type. Always `realtime.session`.

~~986~~

987 - `:"realtime.session"`

~~988~~

989 - `type: :realtime`

~~990~~

991 The type of session to create. Always `realtime` for the Realtime API.

~~992~~

993 - `:realtime`

~~994~~

995 - `audio: Audio{ input, output}`

~~996~~

997 Configuration for input and output audio.

~~998~~

999 - `input: Input{ format_, noise_reduction, transcription, turn_detection}`

~~1000~~

1001 - `format_: RealtimeAudioFormats`

~~1002~~

1003 The format of the input audio.

~~1004~~

1005 - `class AudioPCM`

~~1006~~

1007 The PCM audio format. Only a 24kHz sample rate is supported.

~~1008~~

1009 - `rate: 24000`

~~1010~~

1011 The sample rate of the audio. Always `24000`.

~~1012~~

1013 - `24000`

~~1014~~

1015 - `type: :"audio/pcm"`

~~1016~~

1017 The audio format. Always `audio/pcm`.

~~1018~~

1019 - `:"audio/pcm"`

~~1020~~

1021 - `class AudioPCMU`

~~1022~~

1023 The G.711 μ-law format.

~~1024~~

1025 - `type: :"audio/pcmu"`

~~1026~~

1027 The audio format. Always `audio/pcmu`.

~~1028~~

1029 - `:"audio/pcmu"`

~~1030~~

1031 - `class AudioPCMA`

~~1032~~

1033 The G.711 A-law format.

~~1034~~

1035 - `type: :"audio/pcma"`

~~1036~~

1037 The audio format. Always `audio/pcma`.

~~1038~~

1039 - `:"audio/pcma"`

~~1040~~

1041 - `noise_reduction: NoiseReduction{ type}`

~~1042~~

1043 Configuration for input audio noise reduction. This can be set to `null` to turn off.

1044 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.

1045 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.

~~1046~~

1047 - `type: NoiseReductionType`

~~1048~~

1049 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.

~~1050~~

1051 - `:near_field`

~~1052~~

1053 - `:far_field`

~~1054~~

1055 - `transcription: AudioTranscription`

~~1056~~

1057 - `delay: :minimal | :low | :medium | 2 more`

~~1058~~

1059 Controls how long the model waits before emitting transcription text.

1060 Higher values can improve transcription accuracy at the cost of latency.

1061 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~1062~~

1063 - `:minimal`

~~1064~~

1065 - `:low`

~~1066~~

1067 - `:medium`

~~1068~~

1069 - `:high`

~~1070~~

1071 - `:xhigh`

~~1072~~

1073 - `language: String`

~~1074~~

1075 The language of the input audio. Supplying the input language in

1076 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format

1077 will improve accuracy and latency.

~~1078~~

1079 - `model: String | :"whisper-1" | :"gpt-4o-mini-transcribe" | :"gpt-4o-mini-transcribe-2025-12-15" | 3 more`

~~1080~~

1081 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~1082~~

1083 - `String = String`

~~1084~~

1085 - `Model = :"whisper-1" | :"gpt-4o-mini-transcribe" | :"gpt-4o-mini-transcribe-2025-12-15" | 3 more`

~~1086~~

1087 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~1088~~

1089 - `:"whisper-1"`

~~1090~~

1091 - `:"gpt-4o-mini-transcribe"`

~~1092~~

1093 - `:"gpt-4o-mini-transcribe-2025-12-15"`

~~1094~~

1095 - `:"gpt-4o-transcribe"`

~~1096~~

1097 - `:"gpt-4o-transcribe-diarize"`

~~1098~~

1099 - `:"gpt-realtime-whisper"`

~~1100~~

1101 - `prompt: String`

~~1102~~

1103 An optional text to guide the model's style or continue a previous audio

1104 segment.

1105 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).

1106 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".

1107 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~1108~~

1109 - `turn_detection: ServerVad{ type, create_response, idle_timeout_ms, 4 more} | SemanticVad{ type, create_response, eagerness, interrupt_response}`

~~1110~~

1111 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.

~~1112~~

1113 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.

~~1114~~

1115 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.

~~1116~~

1117 For `gpt-realtime-whisper` transcription sessions, turn detection must be

1118 set to `null`; VAD is not supported.

~~1119~~

1120 - `class ServerVad`

~~1121~~

1122 Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.

~~1123~~

1124 - `type: :server_vad`

~~1125~~

1126 Type of turn detection, `server_vad` to turn on simple Server VAD.

~~1127~~

1128 - `:server_vad`

~~1129~~

1130 - `create_response: bool`

~~1131~~

1132 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.

~~1133~~

1134 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~1135~~

1136 - `idle_timeout_ms: Integer`

~~1137~~

1138 Optional timeout after which a model response will be triggered automatically. This is

1139 useful for situations in which a long pause from the user is unexpected, such as a phone

1140 call. The model will effectively prompt the user to continue the conversation based

1141 on the current context.

~~1142~~

1143 The timeout value will be applied after the last model response's audio has finished playing,

1144 i.e. it's set to the `response.done` time plus audio playback duration.

~~1145~~

1146 An `input_audio_buffer.timeout_triggered` event (plus events

1147 associated with the Response) will be emitted when the timeout is reached.

1148 Idle timeout is currently only supported for `server_vad` mode.

~~1149~~

1150 - `interrupt_response: bool`

~~1151~~

1152 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default

1153 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.

~~1154~~

1155 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~1156~~

1157 - `prefix_padding_ms: Integer`

~~1158~~

1159 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in

1160 milliseconds). Defaults to 300ms.

~~1161~~

1162 - `silence_duration_ms: Integer`

~~1163~~

1164 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults

1165 to 500ms. With shorter values the model will respond more quickly,

1166 but may jump in on short pauses from the user.

~~1167~~

1168 - `threshold: Float`

~~1169~~

1170 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

1171 higher threshold will require louder audio to activate the model, and

1172 thus might perform better in noisy environments.

~~1173~~

1174 - `class SemanticVad`

~~1175~~

1176 Server-side semantic turn detection which uses a model to determine when the user has finished speaking.

~~1177~~

1178 - `type: :semantic_vad`

~~1179~~

1180 Type of turn detection, `semantic_vad` to turn on Semantic VAD.

~~1181~~

1182 - `:semantic_vad`

~~1183~~

1184 - `create_response: bool`

~~1185~~

1186 Whether or not to automatically generate a response when a VAD stop event occurs.

~~1187~~

1188 - `eagerness: :low | :medium | :high | :auto`

~~1189~~

1190 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.

~~1191~~

1192 - `:low`

~~1193~~

1194 - `:medium`

~~1195~~

1196 - `:high`

~~1197~~

1198 - `:auto`

~~1199~~

1200 - `interrupt_response: bool`

~~1201~~

1202 Whether or not to automatically interrupt any ongoing response with output to the default

1203 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.

~~1204~~

1205 - `output: Output{ format_, speed, voice}`

~~1206~~

1207 - `format_: RealtimeAudioFormats`

~~1208~~

1209 The format of the output audio.

~~1210~~

1211 - `speed: Float`

~~1212~~

1213 The speed of the model's spoken response as a multiple of the original speed.

1214 1.0 is the default speed. 0.25 is the minimum speed. 1.5 is the maximum speed. This value can only be changed in between model turns, not while a response is in progress.

~~1215~~

1216 This parameter is a post-processing adjustment to the audio after it is generated, it's

1217 also possible to prompt the model to speak faster or slower.

~~1218~~

1219 - `voice: String | :alloy | :ash | :ballad | 7 more`

~~1220~~

1221 The voice the model uses to respond. Voice cannot be changed during the

1222 session once the model has responded with audio at least once. Current

1223 voice options are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`,

1224 `shimmer`, `verse`, `marin`, and `cedar`. We recommend `marin` and `cedar` for

1225 best quality.

~~1226~~

1227 - `String = String`

~~1228~~

1229 - `Voice = :alloy | :ash | :ballad | 7 more`

~~1230~~

1231 The voice the model uses to respond. Voice cannot be changed during the

1232 session once the model has responded with audio at least once. Current

1233 voice options are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`,

1234 `shimmer`, `verse`, `marin`, and `cedar`. We recommend `marin` and `cedar` for

1235 best quality.

~~1236~~

1237 - `:alloy`

~~1238~~

1239 - `:ash`

~~1240~~

1241 - `:ballad`

~~1242~~

1243 - `:coral`

~~1244~~

1245 - `:echo`

~~1246~~

1247 - `:sage`

~~1248~~

1249 - `:shimmer`

~~1250~~

1251 - `:verse`

~~1252~~

1253 - `:marin`

~~1254~~

1255 - `:cedar`

~~1256~~

1257 - `expires_at: Integer`

~~1258~~

1259 Expiration timestamp for the session, in seconds since epoch.

~~1260~~

1261 - `include: Array[:"item.input_audio_transcription.logprobs"]`

~~1262~~

1263 Additional fields to include in server outputs.

~~1264~~

1265 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~1266~~

1267 - `:"item.input_audio_transcription.logprobs"`

~~1268~~

1269 - `instructions: String`

~~1270~~

1271 The default system instructions (i.e. system message) prepended to model calls. This field allows the client to guide the model on desired responses. The model can be instructed on response content and format, (e.g. "be extremely succinct", "act friendly", "here are examples of good responses") and on audio behavior (e.g. "talk quickly", "inject emotion into your voice", "laugh frequently"). The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behavior.

~~1272~~

1273 Note that the server sets default instructions which will be used if this field is not set and are visible in the `session.created` event at the start of the session.

~~1274~~

1275 - `max_output_tokens: Integer | :inf`

~~1276~~

1277 Maximum number of output tokens for a single assistant response,

1278 inclusive of tool calls. Provide an integer between 1 and 4096 to

1279 limit output tokens, or `inf` for the maximum available tokens for a

1280 given model. Defaults to `inf`.

~~1281~~

1282 - `Integer = Integer`

~~1283~~

1284 - `MaxOutputTokens = :inf`

~~1285~~

1286 - `:inf`

~~1287~~

1288 - `model: String | :"gpt-realtime" | :"gpt-realtime-1.5" | :"gpt-realtime-2" | 14 more`

~~1289~~

1290 The Realtime model used for this session.

~~1291~~

1292 - `String = String`

~~1293~~

1294 - `Model = :"gpt-realtime" | :"gpt-realtime-1.5" | :"gpt-realtime-2" | 14 more`

~~1295~~

1296 The Realtime model used for this session.

~~1297~~

1298 - `:"gpt-realtime"`

~~1299~~

1300 - `:"gpt-realtime-1.5"`

~~1301~~

1302 - `:"gpt-realtime-2"`

~~1303~~

1304 - `:"gpt-realtime-2025-08-28"`

~~1305~~

1306 - `:"gpt-4o-realtime-preview"`

~~1307~~

1308 - `:"gpt-4o-realtime-preview-2024-10-01"`

~~1309~~

1310 - `:"gpt-4o-realtime-preview-2024-12-17"`

~~1311~~

1312 - `:"gpt-4o-realtime-preview-2025-06-03"`

~~1313~~

1314 - `:"gpt-4o-mini-realtime-preview"`

~~1315~~

1316 - `:"gpt-4o-mini-realtime-preview-2024-12-17"`

~~1317~~

1318 - `:"gpt-realtime-mini"`

~~1319~~

1320 - `:"gpt-realtime-mini-2025-10-06"`

~~1321~~

1322 - `:"gpt-realtime-mini-2025-12-15"`

~~1323~~

1324 - `:"gpt-audio-1.5"`

~~1325~~

1326 - `:"gpt-audio-mini"`

~~1327~~

1328 - `:"gpt-audio-mini-2025-10-06"`

~~1329~~

1330 - `:"gpt-audio-mini-2025-12-15"`

~~1331~~

1332 - `output_modalities: Array[:text | :audio]`

~~1333~~

1334 The set of modalities the model can respond with. It defaults to `["audio"]`, indicating

1335 that the model will respond with audio plus a transcript. `["text"]` can be used to make

1336 the model respond with text only. It is not possible to request both `text` and `audio` at the same time.

~~1337~~

1338 - `:text`

~~1339~~

1340 - `:audio`

~~1341~~

1342 - `prompt: ResponsePrompt`

~~1343~~

1344 Reference to a prompt template and its variables.

1345 [Learn more](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts).

~~1346~~

1347 - `id: String`

~~1348~~

1349 The unique identifier of the prompt template to use.

~~1350~~

1351 - `variables: Hash[Symbol, String | ResponseInputText | ResponseInputImage | ResponseInputFile]`

~~1352~~

1353 Optional map of values to substitute in for variables in your

1354 prompt. The substitution values can either be strings, or other

1355 Response input types like images or files.

~~1356~~

1357 - `String = String`

~~1358~~

1359 - `class ResponseInputText`

~~1360~~

1361 A text input to the model.

~~1362~~

1363 - `text: String`

~~1364~~

1365 The text input to the model.

~~1366~~

1367 - `type: :input_text`

~~1368~~

1369 The type of the input item. Always `input_text`.

~~1370~~

1371 - `:input_text`

~~1372~~

1373 - `class ResponseInputImage`

~~1374~~

1375 An image input to the model. Learn about [image inputs](https://platform.openai.com/docs/guides/vision).

~~1376~~

1377 - `detail: :low | :high | :auto | :original`

~~1378~~

1379 The detail level of the image to be sent to the model. One of `high`, `low`, `auto`, or `original`. Defaults to `auto`.

~~1380~~

1381 - `:low`

~~1382~~

1383 - `:high`

~~1384~~

1385 - `:auto`

~~1386~~

1387 - `:original`

~~1388~~

1389 - `type: :input_image`

~~1390~~

1391 The type of the input item. Always `input_image`.

~~1392~~

1393 - `:input_image`

~~1394~~

1395 - `file_id: String`

~~1396~~

1397 The ID of the file to be sent to the model.

~~1398~~

1399 - `image_url: String`

~~1400~~

1401 The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.

~~1402~~

1403 - `class ResponseInputFile`

~~1404~~

1405 A file input to the model.

~~1406~~

1407 - `type: :input_file`

~~1408~~

1409 The type of the input item. Always `input_file`.

~~1410~~

1411 - `:input_file`

~~1412~~

1413 - `detail: :low | :high`

~~1414~~

1415 The detail level of the file to be sent to the model. Use `low` for the default rendering behavior, or `high` to render the file at higher quality. Defaults to `low`.

~~1416~~

1417 - `:low`

~~1418~~

1419 - `:high`

~~1420~~

1421 - `file_data: String`

~~1422~~

1423 The content of the file to be sent to the model.

~~1424~~

1425 - `file_id: String`

~~1426~~

1427 The ID of the file to be sent to the model.

~~1428~~

1429 - `file_url: String`

~~1430~~

1431 The URL of the file to be sent to the model.

~~1432~~

1433 - `filename: String`

~~1434~~

1435 The name of the file to be sent to the model.

~~1436~~

1437 - `version: String`

~~1438~~

1439 Optional version of the prompt template.

~~1440~~

1441 - `reasoning: RealtimeReasoning`

~~1442~~

1443 Configuration for reasoning-capable Realtime models such as `gpt-realtime-2`.

~~1444~~

1445 - `effort: RealtimeReasoningEffort`

~~1446~~

1447 Constrains effort on reasoning for reasoning-capable Realtime models such as

1448 `gpt-realtime-2`.

~~1449~~

1450 - `:minimal`

~~1451~~

1452 - `:low`

~~1453~~

1454 - `:medium`

~~1455~~

1456 - `:high`

~~1457~~

1458 - `:xhigh`

~~1459~~

1460 - `tool_choice: ToolChoiceOptions | ToolChoiceFunction | ToolChoiceMcp`

~~1461~~

1462 How the model chooses tools. Provide one of the string modes or force a specific

1463 function/MCP tool.

~~1464~~

1465 - `ToolChoiceOptions = :none | :auto | :required`

~~1466~~

1467 Controls which (if any) tool is called by the model.

~~1468~~

1469 `none` means the model will not call any tool and instead generates a message.

~~1470~~

1471 `auto` means the model can pick between generating a message or calling one or

1472 more tools.

~~1473~~

1474 `required` means the model must call one or more tools.

~~1475~~

1476 - `:none`

~~1477~~

1478 - `:auto`

~~1479~~

1480 - `:required`

~~1481~~

1482 - `class ToolChoiceFunction`

~~1483~~

1484 Use this option to force the model to call a specific function.

~~1485~~

1486 - `name: String`

~~1487~~

1488 The name of the function to call.

~~1489~~

1490 - `type: :function`

~~1491~~

1492 For function calling, the type is always `function`.

~~1493~~

1494 - `:function`

~~1495~~

1496 - `class ToolChoiceMcp`

~~1497~~

1498 Use this option to force the model to call a specific tool on a remote MCP server.

~~1499~~

1500 - `server_label: String`

~~1501~~

1502 The label of the MCP server to use.

~~1503~~

1504 - `type: :mcp`

~~1505~~

1506 For MCP tools, the type is always `mcp`.

~~1507~~

1508 - `:mcp`

~~1509~~

1510 - `name: String`

~~1511~~

1512 The name of the tool to call on the server.

~~1513~~

1514 - `tools: Array[RealtimeFunctionTool | McpTool{ server_label, type, allowed_tools, 7 more}]`

~~1515~~

1516 Tools available to the model.

~~1517~~

1518 - `class RealtimeFunctionTool`

~~1519~~

1520 - `description: String`

~~1521~~

1522 The description of the function, including guidance on when and how

1523 to call it, and guidance about what to tell the user when calling

1524 (if anything).

~~1525~~

1526 - `name: String`

~~1527~~

1528 The name of the function.

~~1529~~

1530 - `parameters: untyped`

~~1531~~

1532 Parameters of the function in JSON Schema.

~~1533~~

1534 - `type: :function`

~~1535~~

1536 The type of the tool, i.e. `function`.

~~1537~~

1538 - `:function`

~~1539~~

1540 - `class McpTool`

~~1541~~

1542 Give the model access to additional tools via remote Model Context Protocol

1543 (MCP) servers. [Learn more about MCP](https://platform.openai.com/docs/guides/tools-remote-mcp).

~~1544~~

1545 - `server_label: String`

~~1546~~

1547 A label for this MCP server, used to identify it in tool calls.

~~1548~~

1549 - `type: :mcp`

~~1550~~

1551 The type of the MCP tool. Always `mcp`.

~~1552~~

1553 - `:mcp`

~~1554~~

1555 - `allowed_tools: Array[String] | McpToolFilter{ read_only, tool_names}`

~~1556~~

1557 List of allowed tool names or a filter object.

~~1558~~

1559 - `McpAllowedTools = Array[String]`

~~1560~~

1561 A string array of allowed tool names

~~1562~~

1563 - `class McpToolFilter`

~~1564~~

1565 A filter object to specify which tools are allowed.

~~1566~~

1567 - `read_only: bool`

~~1568~~

1569 Indicates whether or not a tool modifies data or is read-only. If an

1570 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

1571 it will match this filter.

~~1572~~

1573 - `tool_names: Array[String]`

~~1574~~

1575 List of allowed tool names.

~~1576~~

1577 - `authorization: String`

~~1578~~

1579 An OAuth access token that can be used with a remote MCP server, either

1580 with a custom MCP server URL or a service connector. Your application

1581 must handle the OAuth authorization flow and provide the token here.

~~1582~~

1583 - `connector_id: :connector_dropbox | :connector_gmail | :connector_googlecalendar | 5 more`

~~1584~~

1585 Identifier for service connectors, like those available in ChatGPT. One of

1586 `server_url` or `connector_id` must be provided. Learn more about service

1587 connectors [here](https://platform.openai.com/docs/guides/tools-remote-mcp#connectors).

~~1588~~

1589 Currently supported `connector_id` values are:

~~1590~~

1591 - Dropbox: `connector_dropbox`

1592 - Gmail: `connector_gmail`

1593 - Google Calendar: `connector_googlecalendar`

1594 - Google Drive: `connector_googledrive`

1595 - Microsoft Teams: `connector_microsoftteams`

1596 - Outlook Calendar: `connector_outlookcalendar`

1597 - Outlook Email: `connector_outlookemail`

1598 - SharePoint: `connector_sharepoint`

~~1599~~

1600 - `:connector_dropbox`

~~1601~~

1602 - `:connector_gmail`

~~1603~~

1604 - `:connector_googlecalendar`

~~1605~~

1606 - `:connector_googledrive`

~~1607~~

1608 - `:connector_microsoftteams`

~~1609~~

1610 - `:connector_outlookcalendar`

~~1611~~

1612 - `:connector_outlookemail`

~~1613~~

1614 - `:connector_sharepoint`

~~1615~~

1616 - `defer_loading: bool`

~~1617~~

1618 Whether this MCP tool is deferred and discovered via tool search.

~~1619~~

1620 - `headers: Hash[Symbol, String]`

~~1621~~

1622 Optional HTTP headers to send to the MCP server. Use for authentication

1623 or other purposes.

~~1624~~

1625 - `require_approval: McpToolApprovalFilter{ always, never} | :always | :never`

~~1626~~

1627 Specify which of the MCP server's tools require approval.

~~1628~~

1629 - `class McpToolApprovalFilter`

~~1630~~

1631 Specify which of the MCP server's tools require approval. Can be

1632 `always`, `never`, or a filter object associated with tools

1633 that require approval.

~~1634~~

1635 - `always: Always{ read_only, tool_names}`

~~1636~~

1637 A filter object to specify which tools are allowed.

~~1638~~

1639 - `read_only: bool`

~~1640~~

1641 Indicates whether or not a tool modifies data or is read-only. If an

1642 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

1643 it will match this filter.

~~1644~~

1645 - `tool_names: Array[String]`

~~1646~~

1647 List of allowed tool names.

~~1648~~

1649 - `never: Never{ read_only, tool_names}`

~~1650~~

1651 A filter object to specify which tools are allowed.

~~1652~~

1653 - `read_only: bool`

~~1654~~

1655 Indicates whether or not a tool modifies data or is read-only. If an

1656 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

1657 it will match this filter.

~~1658~~

1659 - `tool_names: Array[String]`

~~1660~~

1661 List of allowed tool names.

~~1662~~

1663 - `McpToolApprovalSetting = :always | :never`

~~1664~~

1665 Specify a single approval policy for all tools. One of `always` or

1666 `never`. When set to `always`, all tools will require approval. When

1667 set to `never`, all tools will not require approval.

~~1668~~

1669 - `:always`

~~1670~~

1671 - `:never`

~~1672~~

1673 - `server_description: String`

~~1674~~

1675 Optional description of the MCP server, used to provide more context.

~~1676~~

1677 - `server_url: String`

~~1678~~

1679 The URL for the MCP server. One of `server_url` or `connector_id` must be

1680 provided.

~~1681~~

1682 - `tracing: :auto | TracingConfiguration{ group_id, metadata, workflow_name}`

~~1683~~

1684 Realtime API can write session traces to the [Traces Dashboard](https://platform.openai.com/logs?api=traces). Set to null to disable tracing. Once

1685 tracing is enabled for a session, the configuration cannot be modified.

~~1686~~

1687 `auto` will create a trace for the session with default values for the

1688 workflow name, group id, and metadata.

~~1689~~

1690 - `Tracing = :auto`

~~1691~~

1692 Enables tracing and sets default values for tracing configuration options. Always `auto`.

~~1693~~

1694 - `:auto`

~~1695~~

1696 - `class TracingConfiguration`

~~1697~~

1698 Granular configuration for tracing.

~~1699~~

1700 - `group_id: String`

~~1701~~

1702 The group id to attach to this trace to enable filtering and

1703 grouping in the Traces Dashboard.

~~1704~~

1705 - `metadata: untyped`

~~1706~~

1707 The arbitrary metadata to attach to this trace to enable

1708 filtering in the Traces Dashboard.

~~1709~~

1710 - `workflow_name: String`

~~1711~~

1712 The name of the workflow to attach to this trace. This is used to

1713 name the trace in the Traces Dashboard.

~~1714~~

1715 - `truncation: RealtimeTruncation`

~~1716~~

1717 When the number of tokens in a conversation exceeds the model's input token limit, the conversation be truncated, meaning messages (starting from the oldest) will not be included in the model's context. A 32k context model with 4,096 max output tokens can only include 28,224 tokens in the context before truncation occurs.

~~1718~~

1719 Clients can configure truncation behavior to truncate with a lower max token limit, which is an effective way to control token usage and cost.

~~1720~~

1721 Truncation will reduce the number of cached tokens on the next turn (busting the cache), since messages are dropped from the beginning of the context. However, clients can also configure truncation to retain messages up to a fraction of the maximum context size, which will reduce the need for future truncations and thus improve the cache rate.

~~1722~~

1723 Truncation can be disabled entirely, which means the server will never truncate but would instead return an error if the conversation exceeds the model's input token limit.

~~1724~~

1725 - `RealtimeTruncationStrategy = :auto | :disabled`

~~1726~~

1727 The truncation strategy to use for the session. `auto` is the default truncation strategy. `disabled` will disable truncation and emit errors when the conversation exceeds the input token limit.

~~1728~~

1729 - `:auto`

~~1730~~

1731 - `:disabled`

~~1732~~

1733 - `class RealtimeTruncationRetentionRatio`

~~1734~~

1735 Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.

~~1736~~

1737 - `retention_ratio: Float`

~~1738~~

1739 Fraction of post-instruction conversation tokens to retain (`0.0` - `1.0`) when the conversation exceeds the input token limit. Setting this to `0.8` means that messages will be dropped until 80% of the maximum allowed tokens are used. This helps reduce the frequency of truncations and improve cache rates.

~~1740~~

1741 - `type: :retention_ratio`

~~1742~~

1743 Use retention ratio truncation.

~~1744~~

1745 - `:retention_ratio`

~~1746~~

1747 - `token_limits: TokenLimits{ post_instructions}`

~~1748~~

1749 Optional custom token limits for this truncation strategy. If not provided, the model's default token limits will be used.

~~1750~~

1751 - `post_instructions: Integer`

~~1752~~

1753 Maximum tokens allowed in the conversation after instructions (which including tool definitions). For example, setting this to 5,000 would mean that truncation would occur when the conversation exceeds 5,000 tokens after instructions. This cannot be higher than the model's context window size minus the maximum output tokens.

~~1754~~

1755 - `class RealtimeTranscriptionSessionCreateResponse`

~~1756~~

1757 A Realtime transcription session configuration object.

~~1758~~

1759 - `id: String`

~~1760~~

1761 Unique identifier for the session that looks like `sess_1234567890abcdef`.

~~1762~~

1763 - `object: String`

~~1764~~

1765 The object type. Always `realtime.transcription_session`.

~~1766~~

1767 - `type: :transcription`

~~1768~~

1769 The type of session. Always `transcription` for transcription sessions.

~~1770~~

1771 - `:transcription`

~~1772~~

1773 - `audio: Audio{ input}`

~~1774~~

1775 Configuration for input audio for the session.

~~1776~~

1777 - `input: Input{ format_, noise_reduction, transcription, turn_detection}`

~~1778~~

1779 - `format_: RealtimeAudioFormats`

~~1780~~

1781 The PCM audio format. Only a 24kHz sample rate is supported.

~~1782~~

1783 - `noise_reduction: NoiseReduction{ type}`

~~1784~~

1785 Configuration for input audio noise reduction.

~~1786~~

1787 - `type: NoiseReductionType`

~~1788~~

1789 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.

~~1790~~

1791 - `transcription: AudioTranscription`

~~1792~~

1793 - `turn_detection: RealtimeTranscriptionSessionTurnDetection`

~~1794~~

1795 Configuration for turn detection. Can be set to `null` to turn off. Server

1796 VAD means that the model will detect the start and end of speech based on

1797 audio volume and respond at the end of user speech. For `gpt-realtime-whisper`, this must be `null`; VAD is not supported.

~~1798~~

1799 - `prefix_padding_ms: Integer`

~~1800~~

1801 Amount of audio to include before the VAD detected speech (in

1802 milliseconds). Defaults to 300ms.

~~1803~~

1804 - `silence_duration_ms: Integer`

~~1805~~

1806 Duration of silence to detect speech stop (in milliseconds). Defaults

1807 to 500ms. With shorter values the model will respond more quickly,

1808 but may jump in on short pauses from the user.

~~1809~~

1810 - `threshold: Float`

~~1811~~

1812 Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

1813 higher threshold will require louder audio to activate the model, and

1814 thus might perform better in noisy environments.

~~1815~~

1816 - `type: String`

~~1817~~

1818 Type of turn detection, only `server_vad` is currently supported.

~~1819~~

1820 - `expires_at: Integer`

~~1821~~

1822 Expiration timestamp for the session, in seconds since epoch.

~~1823~~

1824 - `include: Array[:"item.input_audio_transcription.logprobs"]`

~~1825~~

1826 Additional fields to include in server outputs.

~~1827~~

1828 - `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~1829~~

1830 - `:"item.input_audio_transcription.logprobs"`

~~1831~~

1832 - `value: String`

~~1833~~

1834 The generated client secret value.

~~1835~~

1836### Example

~~1837~~

1838```ruby

1839require "openai"

~~1840~~

1841openai = OpenAI::Client.new(api_key: "My API Key")

~~1842~~

1843client_secret = openai.realtime.client_secrets.create

~~1844~~

1845puts(client_secret)

1846```

~~1847~~

1848#### Response

~~1849~~

1850```json

1851{

1852 "expires_at": 0,

1853 "session": {

1854 "id": "id",

1855 "object": "realtime.session",

1856 "type": "realtime",

1857 "audio": {

1858 "input": {

1859 "format": {

1860 "rate": 24000,

1861 "type": "audio/pcm"

1862 },

1863 "noise_reduction": {

1864 "type": "near_field"

1865 },

1866 "transcription": {

1867 "delay": "minimal",

1868 "language": "language",

1869 "model": "string",

1870 "prompt": "prompt"

1871 },

1872 "turn_detection": {

1873 "type": "server_vad",

1874 "create_response": true,

1875 "idle_timeout_ms": 5000,

1876 "interrupt_response": true,

1877 "prefix_padding_ms": 0,

1878 "silence_duration_ms": 0,

1879 "threshold": 0

1880 }

1881 },

1882 "output": {

1883 "format": {

1884 "rate": 24000,

1885 "type": "audio/pcm"

1886 },

1887 "speed": 0.25,

1888 "voice": "ash"

1889 }

1890 },

1891 "expires_at": 0,

1892 "include": [

1893 "item.input_audio_transcription.logprobs"

1894 ],

1895 "instructions": "instructions",

1896 "max_output_tokens": 0,

1897 "model": "string",

1898 "output_modalities": [

1899 "text"

1900 ],

1901 "prompt": {

1902 "id": "id",

1903 "variables": {

1904 "foo": "string"

1905 },

1906 "version": "version"

1907 },

1908 "reasoning": {

1909 "effort": "minimal"

1910 },

1911 "tool_choice": "none",

1912 "tools": [

1913 {

1914 "description": "description",

1915 "name": "name",

1916 "parameters": {},

1917 "type": "function"

1918 }

1919 ],

1920 "tracing": "auto",

1921 "truncation": "auto"

1922 },

1923 "value": "value"

1924}

1925```

~~1926~~

1927## Domain Types

~~1928~~

1929### Realtime Session Create Response

~~1930~~

1931- `class RealtimeSessionCreateResponse`

~~1932~~

1933 A Realtime session configuration object.

~~1934~~

1935 - `id: String`

~~1936~~

1937 Unique identifier for the session that looks like `sess_1234567890abcdef`.

~~1938~~

1939 - `object: :"realtime.session"`

~~1940~~

1941 The object type. Always `realtime.session`.

~~1942~~

1943 - `:"realtime.session"`

~~1944~~

1945 - `type: :realtime`

~~1946~~

1947 The type of session to create. Always `realtime` for the Realtime API.

~~1948~~

1949 - `:realtime`

~~1950~~

1951 - `audio: Audio{ input, output}`

~~1952~~

1953 Configuration for input and output audio.

~~1954~~

1955 - `input: Input{ format_, noise_reduction, transcription, turn_detection}`

~~1956~~

1957 - `format_: RealtimeAudioFormats`

~~1958~~

1959 The format of the input audio.

~~1960~~

1961 - `class AudioPCM`

~~1962~~

1963 The PCM audio format. Only a 24kHz sample rate is supported.

~~1964~~

1965 - `rate: 24000`

~~1966~~

1967 The sample rate of the audio. Always `24000`.

~~1968~~

1969 - `24000`

~~1970~~

1971 - `type: :"audio/pcm"`

~~1972~~

1973 The audio format. Always `audio/pcm`.

~~1974~~

1975 - `:"audio/pcm"`

~~1976~~

1977 - `class AudioPCMU`

~~1978~~

1979 The G.711 μ-law format.

~~1980~~

1981 - `type: :"audio/pcmu"`

~~1982~~

1983 The audio format. Always `audio/pcmu`.

~~1984~~

1985 - `:"audio/pcmu"`

~~1986~~

1987 - `class AudioPCMA`

~~1988~~

1989 The G.711 A-law format.

~~1990~~

1991 - `type: :"audio/pcma"`

~~1992~~

1993 The audio format. Always `audio/pcma`.

~~1994~~

1995 - `:"audio/pcma"`

~~1996~~

1997 - `noise_reduction: NoiseReduction{ type}`

~~1998~~

1999 Configuration for input audio noise reduction. This can be set to `null` to turn off.

2000 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.

2001 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.

~~2002~~

2003 - `type: NoiseReductionType`

~~2004~~

2005 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.

~~2006~~

2007 - `:near_field`

~~2008~~

2009 - `:far_field`

~~2010~~

2011 - `transcription: AudioTranscription`

~~2012~~

2013 - `delay: :minimal | :low | :medium | 2 more`

~~2014~~

2015 Controls how long the model waits before emitting transcription text.

2016 Higher values can improve transcription accuracy at the cost of latency.

2017 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~2018~~

2019 - `:minimal`

~~2020~~

2021 - `:low`

~~2022~~

2023 - `:medium`

~~2024~~

2025 - `:high`

~~2026~~

2027 - `:xhigh`

~~2028~~

2029 - `language: String`

~~2030~~

2031 The language of the input audio. Supplying the input language in

2032 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format

2033 will improve accuracy and latency.

~~2034~~

2035 - `model: String | :"whisper-1" | :"gpt-4o-mini-transcribe" | :"gpt-4o-mini-transcribe-2025-12-15" | 3 more`

~~2036~~

2037 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~2038~~

2039 - `String = String`

~~2040~~

2041 - `Model = :"whisper-1" | :"gpt-4o-mini-transcribe" | :"gpt-4o-mini-transcribe-2025-12-15" | 3 more`

~~2042~~

2043 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~2044~~

2045 - `:"whisper-1"`

~~2046~~

2047 - `:"gpt-4o-mini-transcribe"`

~~2048~~

2049 - `:"gpt-4o-mini-transcribe-2025-12-15"`

~~2050~~

2051 - `:"gpt-4o-transcribe"`

~~2052~~

2053 - `:"gpt-4o-transcribe-diarize"`

~~2054~~

2055 - `:"gpt-realtime-whisper"`

~~2056~~

2057 - `prompt: String`

~~2058~~

2059 An optional text to guide the model's style or continue a previous audio

2060 segment.

2061 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).

2062 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".

2063 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~2064~~

2065 - `turn_detection: ServerVad{ type, create_response, idle_timeout_ms, 4 more} | SemanticVad{ type, create_response, eagerness, interrupt_response}`

~~2066~~

2067 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.

~~2068~~

2069 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.

~~2070~~

2071 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.

~~2072~~

2073 For `gpt-realtime-whisper` transcription sessions, turn detection must be

2074 set to `null`; VAD is not supported.

~~2075~~

2076 - `class ServerVad`

~~2077~~

2078 Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.

~~2079~~

2080 - `type: :server_vad`

~~2081~~

2082 Type of turn detection, `server_vad` to turn on simple Server VAD.

~~2083~~

2084 - `:server_vad`

~~2085~~

2086 - `create_response: bool`

~~2087~~

2088 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.

~~2089~~

2090 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~2091~~

2092 - `idle_timeout_ms: Integer`

~~2093~~

2094 Optional timeout after which a model response will be triggered automatically. This is

2095 useful for situations in which a long pause from the user is unexpected, such as a phone

2096 call. The model will effectively prompt the user to continue the conversation based

2097 on the current context.

~~2098~~

2099 The timeout value will be applied after the last model response's audio has finished playing,

2100 i.e. it's set to the `response.done` time plus audio playback duration.

~~2101~~

2102 An `input_audio_buffer.timeout_triggered` event (plus events

2103 associated with the Response) will be emitted when the timeout is reached.

2104 Idle timeout is currently only supported for `server_vad` mode.

~~2105~~

2106 - `interrupt_response: bool`

~~2107~~

2108 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default

2109 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.

~~2110~~

2111 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~2112~~

2113 - `prefix_padding_ms: Integer`

~~2114~~

2115 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in

2116 milliseconds). Defaults to 300ms.

~~2117~~

2118 - `silence_duration_ms: Integer`

~~2119~~

2120 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults

2121 to 500ms. With shorter values the model will respond more quickly,

2122 but may jump in on short pauses from the user.

~~2123~~

2124 - `threshold: Float`

~~2125~~

2126 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

2127 higher threshold will require louder audio to activate the model, and

2128 thus might perform better in noisy environments.

~~2129~~

2130 - `class SemanticVad`

~~2131~~

2132 Server-side semantic turn detection which uses a model to determine when the user has finished speaking.

~~2133~~

2134 - `type: :semantic_vad`

~~2135~~

2136 Type of turn detection, `semantic_vad` to turn on Semantic VAD.

~~2137~~

2138 - `:semantic_vad`

~~2139~~

2140 - `create_response: bool`

~~2141~~

2142 Whether or not to automatically generate a response when a VAD stop event occurs.

~~2143~~

2144 - `eagerness: :low | :medium | :high | :auto`

~~2145~~

2146 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.

~~2147~~

2148 - `:low`

~~2149~~

2150 - `:medium`

~~2151~~

2152 - `:high`

~~2153~~

2154 - `:auto`

~~2155~~

2156 - `interrupt_response: bool`

~~2157~~

2158 Whether or not to automatically interrupt any ongoing response with output to the default

2159 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.

~~2160~~

2161 - `output: Output{ format_, speed, voice}`

~~2162~~

2163 - `format_: RealtimeAudioFormats`

~~2164~~

2165 The format of the output audio.

~~2166~~

2167 - `speed: Float`

~~2168~~

2169 The speed of the model's spoken response as a multiple of the original speed.

2170 1.0 is the default speed. 0.25 is the minimum speed. 1.5 is the maximum speed. This value can only be changed in between model turns, not while a response is in progress.

~~2171~~

2172 This parameter is a post-processing adjustment to the audio after it is generated, it's

2173 also possible to prompt the model to speak faster or slower.

~~2174~~

2175 - `voice: String | :alloy | :ash | :ballad | 7 more`

~~2176~~

2177 The voice the model uses to respond. Voice cannot be changed during the

2178 session once the model has responded with audio at least once. Current

2179 voice options are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`,

2180 `shimmer`, `verse`, `marin`, and `cedar`. We recommend `marin` and `cedar` for

2181 best quality.

~~2182~~

2183 - `String = String`

~~2184~~

2185 - `Voice = :alloy | :ash | :ballad | 7 more`

~~2186~~

2187 The voice the model uses to respond. Voice cannot be changed during the

2188 session once the model has responded with audio at least once. Current

2189 voice options are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`,

2190 `shimmer`, `verse`, `marin`, and `cedar`. We recommend `marin` and `cedar` for

2191 best quality.

~~2192~~

2193 - `:alloy`

~~2194~~

2195 - `:ash`

~~2196~~

2197 - `:ballad`

~~2198~~

2199 - `:coral`

~~2200~~

2201 - `:echo`

~~2202~~

2203 - `:sage`

~~2204~~

2205 - `:shimmer`

~~2206~~

2207 - `:verse`

~~2208~~

2209 - `:marin`

~~2210~~

2211 - `:cedar`

~~2212~~

2213 - `expires_at: Integer`

~~2214~~

2215 Expiration timestamp for the session, in seconds since epoch.

~~2216~~

2217 - `include: Array[:"item.input_audio_transcription.logprobs"]`

~~2218~~

2219 Additional fields to include in server outputs.

~~2220~~

2221 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~2222~~

2223 - `:"item.input_audio_transcription.logprobs"`

~~2224~~

2225 - `instructions: String`

~~2226~~

2227 The default system instructions (i.e. system message) prepended to model calls. This field allows the client to guide the model on desired responses. The model can be instructed on response content and format, (e.g. "be extremely succinct", "act friendly", "here are examples of good responses") and on audio behavior (e.g. "talk quickly", "inject emotion into your voice", "laugh frequently"). The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behavior.

~~2228~~

2229 Note that the server sets default instructions which will be used if this field is not set and are visible in the `session.created` event at the start of the session.

~~2230~~

2231 - `max_output_tokens: Integer | :inf`

~~2232~~

2233 Maximum number of output tokens for a single assistant response,

2234 inclusive of tool calls. Provide an integer between 1 and 4096 to

2235 limit output tokens, or `inf` for the maximum available tokens for a

2236 given model. Defaults to `inf`.

~~2237~~

2238 - `Integer = Integer`

~~2239~~

2240 - `MaxOutputTokens = :inf`

~~2241~~

2242 - `:inf`

~~2243~~

2244 - `model: String | :"gpt-realtime" | :"gpt-realtime-1.5" | :"gpt-realtime-2" | 14 more`

~~2245~~

2246 The Realtime model used for this session.

~~2247~~

2248 - `String = String`

~~2249~~

2250 - `Model = :"gpt-realtime" | :"gpt-realtime-1.5" | :"gpt-realtime-2" | 14 more`

~~2251~~

2252 The Realtime model used for this session.

~~2253~~

2254 - `:"gpt-realtime"`

~~2255~~

2256 - `:"gpt-realtime-1.5"`

~~2257~~

2258 - `:"gpt-realtime-2"`

~~2259~~

2260 - `:"gpt-realtime-2025-08-28"`

~~2261~~

2262 - `:"gpt-4o-realtime-preview"`

~~2263~~

2264 - `:"gpt-4o-realtime-preview-2024-10-01"`

~~2265~~

2266 - `:"gpt-4o-realtime-preview-2024-12-17"`

~~2267~~

2268 - `:"gpt-4o-realtime-preview-2025-06-03"`

~~2269~~

2270 - `:"gpt-4o-mini-realtime-preview"`

~~2271~~

2272 - `:"gpt-4o-mini-realtime-preview-2024-12-17"`

~~2273~~

2274 - `:"gpt-realtime-mini"`

~~2275~~

2276 - `:"gpt-realtime-mini-2025-10-06"`

~~2277~~

2278 - `:"gpt-realtime-mini-2025-12-15"`

~~2279~~

2280 - `:"gpt-audio-1.5"`

~~2281~~

2282 - `:"gpt-audio-mini"`

~~2283~~

2284 - `:"gpt-audio-mini-2025-10-06"`

~~2285~~

2286 - `:"gpt-audio-mini-2025-12-15"`

~~2287~~

2288 - `output_modalities: Array[:text | :audio]`

~~2289~~

2290 The set of modalities the model can respond with. It defaults to `["audio"]`, indicating

2291 that the model will respond with audio plus a transcript. `["text"]` can be used to make

2292 the model respond with text only. It is not possible to request both `text` and `audio` at the same time.

~~2293~~

2294 - `:text`

~~2295~~

2296 - `:audio`

~~2297~~

2298 - `prompt: ResponsePrompt`

~~2299~~

2300 Reference to a prompt template and its variables.

2301 [Learn more](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts).

~~2302~~

2303 - `id: String`

~~2304~~

2305 The unique identifier of the prompt template to use.

~~2306~~

2307 - `variables: Hash[Symbol, String | ResponseInputText | ResponseInputImage | ResponseInputFile]`

~~2308~~

2309 Optional map of values to substitute in for variables in your

2310 prompt. The substitution values can either be strings, or other

2311 Response input types like images or files.

~~2312~~

2313 - `String = String`

~~2314~~

2315 - `class ResponseInputText`

~~2316~~

2317 A text input to the model.

~~2318~~

2319 - `text: String`

~~2320~~

2321 The text input to the model.

~~2322~~

2323 - `type: :input_text`

~~2324~~

2325 The type of the input item. Always `input_text`.

~~2326~~

2327 - `:input_text`

~~2328~~

2329 - `class ResponseInputImage`

~~2330~~

2331 An image input to the model. Learn about [image inputs](https://platform.openai.com/docs/guides/vision).

~~2332~~

2333 - `detail: :low | :high | :auto | :original`

~~2334~~

2335 The detail level of the image to be sent to the model. One of `high`, `low`, `auto`, or `original`. Defaults to `auto`.

~~2336~~

2337 - `:low`

~~2338~~

2339 - `:high`

~~2340~~

2341 - `:auto`

~~2342~~

2343 - `:original`

~~2344~~

2345 - `type: :input_image`

~~2346~~

2347 The type of the input item. Always `input_image`.

~~2348~~

2349 - `:input_image`

~~2350~~

2351 - `file_id: String`

~~2352~~

2353 The ID of the file to be sent to the model.

~~2354~~

2355 - `image_url: String`

~~2356~~

2357 The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.

~~2358~~

2359 - `class ResponseInputFile`

~~2360~~

2361 A file input to the model.

~~2362~~

2363 - `type: :input_file`

~~2364~~

2365 The type of the input item. Always `input_file`.

~~2366~~

2367 - `:input_file`

~~2368~~

2369 - `detail: :low | :high`

~~2370~~

2371 The detail level of the file to be sent to the model. Use `low` for the default rendering behavior, or `high` to render the file at higher quality. Defaults to `low`.

~~2372~~

2373 - `:low`

~~2374~~

2375 - `:high`

~~2376~~

2377 - `file_data: String`

~~2378~~

2379 The content of the file to be sent to the model.

~~2380~~

2381 - `file_id: String`

~~2382~~

2383 The ID of the file to be sent to the model.

~~2384~~

2385 - `file_url: String`

~~2386~~

2387 The URL of the file to be sent to the model.

~~2388~~

2389 - `filename: String`

~~2390~~

2391 The name of the file to be sent to the model.

~~2392~~

2393 - `version: String`

~~2394~~

2395 Optional version of the prompt template.

~~2396~~

2397 - `reasoning: RealtimeReasoning`

~~2398~~

2399 Configuration for reasoning-capable Realtime models such as `gpt-realtime-2`.

~~2400~~

2401 - `effort: RealtimeReasoningEffort`

~~2402~~

2403 Constrains effort on reasoning for reasoning-capable Realtime models such as

2404 `gpt-realtime-2`.

~~2405~~

2406 - `:minimal`

~~2407~~

2408 - `:low`

~~2409~~

2410 - `:medium`

~~2411~~

2412 - `:high`

~~2413~~

2414 - `:xhigh`

~~2415~~

2416 - `tool_choice: ToolChoiceOptions | ToolChoiceFunction | ToolChoiceMcp`

~~2417~~

2418 How the model chooses tools. Provide one of the string modes or force a specific

2419 function/MCP tool.

~~2420~~

2421 - `ToolChoiceOptions = :none | :auto | :required`

~~2422~~

2423 Controls which (if any) tool is called by the model.

~~2424~~

2425 `none` means the model will not call any tool and instead generates a message.

~~2426~~

2427 `auto` means the model can pick between generating a message or calling one or

2428 more tools.

~~2429~~

2430 `required` means the model must call one or more tools.

~~2431~~

2432 - `:none`

~~2433~~

2434 - `:auto`

~~2435~~

2436 - `:required`

~~2437~~

2438 - `class ToolChoiceFunction`

~~2439~~

2440 Use this option to force the model to call a specific function.

~~2441~~

2442 - `name: String`

~~2443~~

2444 The name of the function to call.

~~2445~~

2446 - `type: :function`

~~2447~~

2448 For function calling, the type is always `function`.

~~2449~~

2450 - `:function`

~~2451~~

2452 - `class ToolChoiceMcp`

~~2453~~

2454 Use this option to force the model to call a specific tool on a remote MCP server.

~~2455~~

2456 - `server_label: String`

~~2457~~

2458 The label of the MCP server to use.

~~2459~~

2460 - `type: :mcp`

~~2461~~

2462 For MCP tools, the type is always `mcp`.

~~2463~~

2464 - `:mcp`

~~2465~~

2466 - `name: String`

~~2467~~

2468 The name of the tool to call on the server.

~~2469~~

2470 - `tools: Array[RealtimeFunctionTool | McpTool{ server_label, type, allowed_tools, 7 more}]`

~~2471~~

2472 Tools available to the model.

~~2473~~

2474 - `class RealtimeFunctionTool`

~~2475~~

2476 - `description: String`

~~2477~~

2478 The description of the function, including guidance on when and how

2479 to call it, and guidance about what to tell the user when calling

2480 (if anything).

~~2481~~

2482 - `name: String`

~~2483~~

2484 The name of the function.

~~2485~~

2486 - `parameters: untyped`

~~2487~~

2488 Parameters of the function in JSON Schema.

~~2489~~

2490 - `type: :function`

~~2491~~

2492 The type of the tool, i.e. `function`.

~~2493~~

2494 - `:function`

~~2495~~

2496 - `class McpTool`

~~2497~~

2498 Give the model access to additional tools via remote Model Context Protocol

2499 (MCP) servers. [Learn more about MCP](https://platform.openai.com/docs/guides/tools-remote-mcp).

~~2500~~

2501 - `server_label: String`

~~2502~~

2503 A label for this MCP server, used to identify it in tool calls.

~~2504~~

2505 - `type: :mcp`

~~2506~~

2507 The type of the MCP tool. Always `mcp`.

~~2508~~

2509 - `:mcp`

~~2510~~

2511 - `allowed_tools: Array[String] | McpToolFilter{ read_only, tool_names}`

~~2512~~

2513 List of allowed tool names or a filter object.

~~2514~~

2515 - `McpAllowedTools = Array[String]`

~~2516~~

2517 A string array of allowed tool names

~~2518~~

2519 - `class McpToolFilter`

~~2520~~

2521 A filter object to specify which tools are allowed.

~~2522~~

2523 - `read_only: bool`

~~2524~~

2525 Indicates whether or not a tool modifies data or is read-only. If an

2526 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

2527 it will match this filter.

~~2528~~

2529 - `tool_names: Array[String]`

~~2530~~

2531 List of allowed tool names.

~~2532~~

2533 - `authorization: String`

~~2534~~

2535 An OAuth access token that can be used with a remote MCP server, either

2536 with a custom MCP server URL or a service connector. Your application

2537 must handle the OAuth authorization flow and provide the token here.

~~2538~~

2539 - `connector_id: :connector_dropbox | :connector_gmail | :connector_googlecalendar | 5 more`

~~2540~~

2541 Identifier for service connectors, like those available in ChatGPT. One of

2542 `server_url` or `connector_id` must be provided. Learn more about service

2543 connectors [here](https://platform.openai.com/docs/guides/tools-remote-mcp#connectors).

~~2544~~

2545 Currently supported `connector_id` values are:

~~2546~~

2547 - Dropbox: `connector_dropbox`

2548 - Gmail: `connector_gmail`

2549 - Google Calendar: `connector_googlecalendar`

2550 - Google Drive: `connector_googledrive`

2551 - Microsoft Teams: `connector_microsoftteams`

2552 - Outlook Calendar: `connector_outlookcalendar`

2553 - Outlook Email: `connector_outlookemail`

2554 - SharePoint: `connector_sharepoint`

~~2555~~

2556 - `:connector_dropbox`

~~2557~~

2558 - `:connector_gmail`

~~2559~~

2560 - `:connector_googlecalendar`

~~2561~~

2562 - `:connector_googledrive`

~~2563~~

2564 - `:connector_microsoftteams`

~~2565~~

2566 - `:connector_outlookcalendar`

~~2567~~

2568 - `:connector_outlookemail`

~~2569~~

2570 - `:connector_sharepoint`

~~2571~~

2572 - `defer_loading: bool`

~~2573~~

2574 Whether this MCP tool is deferred and discovered via tool search.

~~2575~~

2576 - `headers: Hash[Symbol, String]`

~~2577~~

2578 Optional HTTP headers to send to the MCP server. Use for authentication

2579 or other purposes.

~~2580~~

2581 - `require_approval: McpToolApprovalFilter{ always, never} | :always | :never`

~~2582~~

2583 Specify which of the MCP server's tools require approval.

~~2584~~

2585 - `class McpToolApprovalFilter`

~~2586~~

2587 Specify which of the MCP server's tools require approval. Can be

2588 `always`, `never`, or a filter object associated with tools

2589 that require approval.

~~2590~~

2591 - `always: Always{ read_only, tool_names}`

~~2592~~

2593 A filter object to specify which tools are allowed.

~~2594~~

2595 - `read_only: bool`

~~2596~~

2597 Indicates whether or not a tool modifies data or is read-only. If an

2598 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

2599 it will match this filter.

~~2600~~

2601 - `tool_names: Array[String]`

~~2602~~

2603 List of allowed tool names.

~~2604~~

2605 - `never: Never{ read_only, tool_names}`

~~2606~~

2607 A filter object to specify which tools are allowed.

~~2608~~

2609 - `read_only: bool`

~~2610~~

2611 Indicates whether or not a tool modifies data or is read-only. If an

2612 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

2613 it will match this filter.

~~2614~~

2615 - `tool_names: Array[String]`

~~2616~~

2617 List of allowed tool names.

~~2618~~

2619 - `McpToolApprovalSetting = :always | :never`

~~2620~~

2621 Specify a single approval policy for all tools. One of `always` or

2622 `never`. When set to `always`, all tools will require approval. When

2623 set to `never`, all tools will not require approval.

~~2624~~

2625 - `:always`

~~2626~~

2627 - `:never`

~~2628~~

2629 - `server_description: String`

~~2630~~

2631 Optional description of the MCP server, used to provide more context.

~~2632~~

2633 - `server_url: String`

~~2634~~

2635 The URL for the MCP server. One of `server_url` or `connector_id` must be

2636 provided.

~~2637~~

2638 - `tracing: :auto | TracingConfiguration{ group_id, metadata, workflow_name}`

~~2639~~

2640 Realtime API can write session traces to the [Traces Dashboard](https://platform.openai.com/logs?api=traces). Set to null to disable tracing. Once

2641 tracing is enabled for a session, the configuration cannot be modified.

~~2642~~

2643 `auto` will create a trace for the session with default values for the

2644 workflow name, group id, and metadata.

~~2645~~

2646 - `Tracing = :auto`

~~2647~~

2648 Enables tracing and sets default values for tracing configuration options. Always `auto`.

~~2649~~

2650 - `:auto`

~~2651~~

2652 - `class TracingConfiguration`

~~2653~~

2654 Granular configuration for tracing.

~~2655~~

2656 - `group_id: String`

~~2657~~

2658 The group id to attach to this trace to enable filtering and

2659 grouping in the Traces Dashboard.

~~2660~~

2661 - `metadata: untyped`

~~2662~~

2663 The arbitrary metadata to attach to this trace to enable

2664 filtering in the Traces Dashboard.

~~2665~~

2666 - `workflow_name: String`

~~2667~~

2668 The name of the workflow to attach to this trace. This is used to

2669 name the trace in the Traces Dashboard.

~~2670~~

2671 - `truncation: RealtimeTruncation`

~~2672~~

2673 When the number of tokens in a conversation exceeds the model's input token limit, the conversation be truncated, meaning messages (starting from the oldest) will not be included in the model's context. A 32k context model with 4,096 max output tokens can only include 28,224 tokens in the context before truncation occurs.

~~2674~~

2675 Clients can configure truncation behavior to truncate with a lower max token limit, which is an effective way to control token usage and cost.

~~2676~~

2677 Truncation will reduce the number of cached tokens on the next turn (busting the cache), since messages are dropped from the beginning of the context. However, clients can also configure truncation to retain messages up to a fraction of the maximum context size, which will reduce the need for future truncations and thus improve the cache rate.

~~2678~~

2679 Truncation can be disabled entirely, which means the server will never truncate but would instead return an error if the conversation exceeds the model's input token limit.

~~2680~~

2681 - `RealtimeTruncationStrategy = :auto | :disabled`

~~2682~~

2683 The truncation strategy to use for the session. `auto` is the default truncation strategy. `disabled` will disable truncation and emit errors when the conversation exceeds the input token limit.

~~2684~~

2685 - `:auto`

~~2686~~

2687 - `:disabled`

~~2688~~

2689 - `class RealtimeTruncationRetentionRatio`

~~2690~~

2691 Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.

~~2692~~

2693 - `retention_ratio: Float`

~~2694~~

2695 Fraction of post-instruction conversation tokens to retain (`0.0` - `1.0`) when the conversation exceeds the input token limit. Setting this to `0.8` means that messages will be dropped until 80% of the maximum allowed tokens are used. This helps reduce the frequency of truncations and improve cache rates.

~~2696~~

2697 - `type: :retention_ratio`

~~2698~~

2699 Use retention ratio truncation.

~~2700~~

2701 - `:retention_ratio`

~~2702~~

2703 - `token_limits: TokenLimits{ post_instructions}`

~~2704~~

2705 Optional custom token limits for this truncation strategy. If not provided, the model's default token limits will be used.

~~2706~~

2707 - `post_instructions: Integer`

~~2708~~

2709 Maximum tokens allowed in the conversation after instructions (which including tool definitions). For example, setting this to 5,000 would mean that truncation would occur when the conversation exceeds 5,000 tokens after instructions. This cannot be higher than the model's context window size minus the maximum output tokens.

~~2710~~

2711### Realtime Transcription Session Create Response

~~2712~~

2713- `class RealtimeTranscriptionSessionCreateResponse`

~~2714~~

2715 A Realtime transcription session configuration object.

~~2716~~

2717 - `id: String`

~~2718~~

2719 Unique identifier for the session that looks like `sess_1234567890abcdef`.

~~2720~~

2721 - `object: String`

~~2722~~

2723 The object type. Always `realtime.transcription_session`.

~~2724~~

2725 - `type: :transcription`

~~2726~~

2727 The type of session. Always `transcription` for transcription sessions.

~~2728~~

2729 - `:transcription`

~~2730~~

2731 - `audio: Audio{ input}`

~~2732~~

2733 Configuration for input audio for the session.

~~2734~~

2735 - `input: Input{ format_, noise_reduction, transcription, turn_detection}`

~~2736~~

2737 - `format_: RealtimeAudioFormats`

~~2738~~

2739 The PCM audio format. Only a 24kHz sample rate is supported.

~~2740~~

2741 - `class AudioPCM`

~~2742~~

2743 The PCM audio format. Only a 24kHz sample rate is supported.

~~2744~~

2745 - `rate: 24000`

~~2746~~

2747 The sample rate of the audio. Always `24000`.

~~2748~~

2749 - `24000`

~~2750~~

2751 - `type: :"audio/pcm"`

~~2752~~

2753 The audio format. Always `audio/pcm`.

~~2754~~

2755 - `:"audio/pcm"`

~~2756~~

2757 - `class AudioPCMU`

~~2758~~

2759 The G.711 μ-law format.

~~2760~~

2761 - `type: :"audio/pcmu"`

~~2762~~

2763 The audio format. Always `audio/pcmu`.

~~2764~~

2765 - `:"audio/pcmu"`

~~2766~~

2767 - `class AudioPCMA`

~~2768~~

2769 The G.711 A-law format.

~~2770~~

2771 - `type: :"audio/pcma"`

~~2772~~

2773 The audio format. Always `audio/pcma`.

~~2774~~

2775 - `:"audio/pcma"`

~~2776~~

2777 - `noise_reduction: NoiseReduction{ type}`

~~2778~~

2779 Configuration for input audio noise reduction.

~~2780~~

2781 - `type: NoiseReductionType`

~~2782~~

2783 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.

~~2784~~

2785 - `:near_field`

~~2786~~

2787 - `:far_field`

~~2788~~

2789 - `transcription: AudioTranscription`

~~2790~~

2791 - `delay: :minimal | :low | :medium | 2 more`

~~2792~~

2793 Controls how long the model waits before emitting transcription text.

2794 Higher values can improve transcription accuracy at the cost of latency.

2795 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~2796~~

2797 - `:minimal`

~~2798~~

2799 - `:low`

~~2800~~

2801 - `:medium`

~~2802~~

2803 - `:high`

~~2804~~

2805 - `:xhigh`

~~2806~~

2807 - `language: String`

~~2808~~

2809 The language of the input audio. Supplying the input language in

2810 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format

2811 will improve accuracy and latency.

~~2812~~

2813 - `model: String | :"whisper-1" | :"gpt-4o-mini-transcribe" | :"gpt-4o-mini-transcribe-2025-12-15" | 3 more`

~~2814~~

2815 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~2816~~

2817 - `String = String`

~~2818~~

2819 - `Model = :"whisper-1" | :"gpt-4o-mini-transcribe" | :"gpt-4o-mini-transcribe-2025-12-15" | 3 more`

~~2820~~

2821 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~2822~~

2823 - `:"whisper-1"`

~~2824~~

2825 - `:"gpt-4o-mini-transcribe"`

~~2826~~

2827 - `:"gpt-4o-mini-transcribe-2025-12-15"`

~~2828~~

2829 - `:"gpt-4o-transcribe"`

~~2830~~

2831 - `:"gpt-4o-transcribe-diarize"`

~~2832~~

2833 - `:"gpt-realtime-whisper"`

~~2834~~

2835 - `prompt: String`

~~2836~~

2837 An optional text to guide the model's style or continue a previous audio

2838 segment.

2839 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).

2840 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".

2841 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~2842~~

2843 - `turn_detection: RealtimeTranscriptionSessionTurnDetection`

~~2844~~

2845 Configuration for turn detection. Can be set to `null` to turn off. Server

2846 VAD means that the model will detect the start and end of speech based on

2847 audio volume and respond at the end of user speech. For `gpt-realtime-whisper`, this must be `null`; VAD is not supported.

~~2848~~

2849 - `prefix_padding_ms: Integer`

~~2850~~

2851 Amount of audio to include before the VAD detected speech (in

2852 milliseconds). Defaults to 300ms.

~~2853~~

2854 - `silence_duration_ms: Integer`

~~2855~~

2856 Duration of silence to detect speech stop (in milliseconds). Defaults

2857 to 500ms. With shorter values the model will respond more quickly,

2858 but may jump in on short pauses from the user.

~~2859~~

2860 - `threshold: Float`

~~2861~~

2862 Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

2863 higher threshold will require louder audio to activate the model, and

2864 thus might perform better in noisy environments.

~~2865~~

2866 - `type: String`

~~2867~~

2868 Type of turn detection, only `server_vad` is currently supported.

~~2869~~

2870 - `expires_at: Integer`

~~2871~~

2872 Expiration timestamp for the session, in seconds since epoch.

~~2873~~

2874 - `include: Array[:"item.input_audio_transcription.logprobs"]`

~~2875~~

2876 Additional fields to include in server outputs.

~~2877~~

2878 - `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~2879~~

2880 - `:"item.input_audio_transcription.logprobs"`

~~2881~~

2882### Realtime Transcription Session Turn Detection

~~2883~~

2884- `class RealtimeTranscriptionSessionTurnDetection`

~~2885~~

2886 Configuration for turn detection. Can be set to `null` to turn off. Server

2887 VAD means that the model will detect the start and end of speech based on

2888 audio volume and respond at the end of user speech. For `gpt-realtime-whisper`, this must be `null`; VAD is not supported.

~~2889~~

2890 - `prefix_padding_ms: Integer`

~~2891~~

2892 Amount of audio to include before the VAD detected speech (in

2893 milliseconds). Defaults to 300ms.

~~2894~~

2895 - `silence_duration_ms: Integer`

~~2896~~

2897 Duration of silence to detect speech stop (in milliseconds). Defaults

2898 to 500ms. With shorter values the model will respond more quickly,

2899 but may jump in on short pauses from the user.

~~2900~~

2901 - `threshold: Float`

~~2902~~

2903 Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

2904 higher threshold will require louder audio to activate the model, and

2905 thus might perform better in noisy environments.

~~2906~~

2907 - `type: String`

~~2908~~

2909 Type of turn detection, only `server_vad` is currently supported.

~~2910~~

2911### Client Secret Create Response

~~2912~~

2913- `class ClientSecretCreateResponse`

~~2914~~

2915 Response from creating a session and client secret for the Realtime API.

~~2916~~

2917 - `expires_at: Integer`

~~2918~~

2919 Expiration timestamp for the client secret, in seconds since epoch.

~~2920~~

2921 - `session: RealtimeSessionCreateResponse | RealtimeTranscriptionSessionCreateResponse`

~~2922~~

2923 The session configuration for either a realtime or transcription session.

~~2924~~

2925 - `class RealtimeSessionCreateResponse`

~~2926~~

2927 A Realtime session configuration object.

~~2928~~

2929 - `id: String`

~~2930~~

2931 Unique identifier for the session that looks like `sess_1234567890abcdef`.

~~2932~~

2933 - `object: :"realtime.session"`

~~2934~~

2935 The object type. Always `realtime.session`.

~~2936~~

2937 - `:"realtime.session"`

~~2938~~

2939 - `type: :realtime`

~~2940~~

2941 The type of session to create. Always `realtime` for the Realtime API.

~~2942~~

2943 - `:realtime`

~~2944~~

2945 - `audio: Audio{ input, output}`

~~2946~~

2947 Configuration for input and output audio.

~~2948~~

2949 - `input: Input{ format_, noise_reduction, transcription, turn_detection}`

~~2950~~

2951 - `format_: RealtimeAudioFormats`

~~2952~~

2953 The format of the input audio.

~~2954~~

2955 - `class AudioPCM`

~~2956~~

2957 The PCM audio format. Only a 24kHz sample rate is supported.

~~2958~~

2959 - `rate: 24000`

~~2960~~

2961 The sample rate of the audio. Always `24000`.

~~2962~~

2963 - `24000`

~~2964~~

2965 - `type: :"audio/pcm"`

~~2966~~

2967 The audio format. Always `audio/pcm`.

~~2968~~

2969 - `:"audio/pcm"`

~~2970~~

2971 - `class AudioPCMU`

~~2972~~

2973 The G.711 μ-law format.

~~2974~~

2975 - `type: :"audio/pcmu"`

~~2976~~

2977 The audio format. Always `audio/pcmu`.

~~2978~~

2979 - `:"audio/pcmu"`

~~2980~~

2981 - `class AudioPCMA`

~~2982~~

2983 The G.711 A-law format.

~~2984~~

2985 - `type: :"audio/pcma"`

~~2986~~

2987 The audio format. Always `audio/pcma`.

~~2988~~

2989 - `:"audio/pcma"`

~~2990~~

2991 - `noise_reduction: NoiseReduction{ type}`

~~2992~~

2993 Configuration for input audio noise reduction. This can be set to `null` to turn off.

2994 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.

2995 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.

~~2996~~

2997 - `type: NoiseReductionType`

~~2998~~

2999 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.

~~3000~~

3001 - `:near_field`

~~3002~~

3003 - `:far_field`

~~3004~~

3005 - `transcription: AudioTranscription`

~~3006~~

3007 - `delay: :minimal | :low | :medium | 2 more`

~~3008~~

3009 Controls how long the model waits before emitting transcription text.

3010 Higher values can improve transcription accuracy at the cost of latency.

3011 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~3012~~

3013 - `:minimal`

~~3014~~

3015 - `:low`

~~3016~~

3017 - `:medium`

~~3018~~

3019 - `:high`

~~3020~~

3021 - `:xhigh`

~~3022~~

3023 - `language: String`

~~3024~~

3025 The language of the input audio. Supplying the input language in

3026 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format

3027 will improve accuracy and latency.

~~3028~~

3029 - `model: String | :"whisper-1" | :"gpt-4o-mini-transcribe" | :"gpt-4o-mini-transcribe-2025-12-15" | 3 more`

~~3030~~

3031 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~3032~~

3033 - `String = String`

~~3034~~

3035 - `Model = :"whisper-1" | :"gpt-4o-mini-transcribe" | :"gpt-4o-mini-transcribe-2025-12-15" | 3 more`

~~3036~~

3037 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.

~~3038~~

3039 - `:"whisper-1"`

~~3040~~

3041 - `:"gpt-4o-mini-transcribe"`

~~3042~~

3043 - `:"gpt-4o-mini-transcribe-2025-12-15"`

~~3044~~

3045 - `:"gpt-4o-transcribe"`

~~3046~~

3047 - `:"gpt-4o-transcribe-diarize"`

~~3048~~

3049 - `:"gpt-realtime-whisper"`

~~3050~~

3051 - `prompt: String`

~~3052~~

3053 An optional text to guide the model's style or continue a previous audio

3054 segment.

3055 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).

3056 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".

3057 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.

~~3058~~

3059 - `turn_detection: ServerVad{ type, create_response, idle_timeout_ms, 4 more} | SemanticVad{ type, create_response, eagerness, interrupt_response}`

~~3060~~

3061 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.

~~3062~~

3063 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.

~~3064~~

3065 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.

~~3066~~

3067 For `gpt-realtime-whisper` transcription sessions, turn detection must be

3068 set to `null`; VAD is not supported.

~~3069~~

3070 - `class ServerVad`

~~3071~~

3072 Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.

~~3073~~

3074 - `type: :server_vad`

~~3075~~

3076 Type of turn detection, `server_vad` to turn on simple Server VAD.

~~3077~~

3078 - `:server_vad`

~~3079~~

3080 - `create_response: bool`

~~3081~~

3082 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.

~~3083~~

3084 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~3085~~

3086 - `idle_timeout_ms: Integer`

~~3087~~

3088 Optional timeout after which a model response will be triggered automatically. This is

3089 useful for situations in which a long pause from the user is unexpected, such as a phone

3090 call. The model will effectively prompt the user to continue the conversation based

3091 on the current context.

~~3092~~

3093 The timeout value will be applied after the last model response's audio has finished playing,

3094 i.e. it's set to the `response.done` time plus audio playback duration.

~~3095~~

3096 An `input_audio_buffer.timeout_triggered` event (plus events

3097 associated with the Response) will be emitted when the timeout is reached.

3098 Idle timeout is currently only supported for `server_vad` mode.

~~3099~~

3100 - `interrupt_response: bool`

~~3101~~

3102 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default

3103 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.

~~3104~~

3105 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.

~~3106~~

3107 - `prefix_padding_ms: Integer`

~~3108~~

3109 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in

3110 milliseconds). Defaults to 300ms.

~~3111~~

3112 - `silence_duration_ms: Integer`

~~3113~~

3114 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults

3115 to 500ms. With shorter values the model will respond more quickly,

3116 but may jump in on short pauses from the user.

~~3117~~

3118 - `threshold: Float`

~~3119~~

3120 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

3121 higher threshold will require louder audio to activate the model, and

3122 thus might perform better in noisy environments.

~~3123~~

3124 - `class SemanticVad`

~~3125~~

3126 Server-side semantic turn detection which uses a model to determine when the user has finished speaking.

~~3127~~

3128 - `type: :semantic_vad`

~~3129~~

3130 Type of turn detection, `semantic_vad` to turn on Semantic VAD.

~~3131~~

3132 - `:semantic_vad`

~~3133~~

3134 - `create_response: bool`

~~3135~~

3136 Whether or not to automatically generate a response when a VAD stop event occurs.

~~3137~~

3138 - `eagerness: :low | :medium | :high | :auto`

~~3139~~

3140 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.

~~3141~~

3142 - `:low`

~~3143~~

3144 - `:medium`

~~3145~~

3146 - `:high`

~~3147~~

3148 - `:auto`

~~3149~~

3150 - `interrupt_response: bool`

~~3151~~

3152 Whether or not to automatically interrupt any ongoing response with output to the default

3153 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.

~~3154~~

3155 - `output: Output{ format_, speed, voice}`

~~3156~~

3157 - `format_: RealtimeAudioFormats`

~~3158~~

3159 The format of the output audio.

~~3160~~

3161 - `speed: Float`

~~3162~~

3163 The speed of the model's spoken response as a multiple of the original speed.

3164 1.0 is the default speed. 0.25 is the minimum speed. 1.5 is the maximum speed. This value can only be changed in between model turns, not while a response is in progress.

~~3165~~

3166 This parameter is a post-processing adjustment to the audio after it is generated, it's

3167 also possible to prompt the model to speak faster or slower.

~~3168~~

3169 - `voice: String | :alloy | :ash | :ballad | 7 more`

~~3170~~

3171 The voice the model uses to respond. Voice cannot be changed during the

3172 session once the model has responded with audio at least once. Current

3173 voice options are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`,

3174 `shimmer`, `verse`, `marin`, and `cedar`. We recommend `marin` and `cedar` for

3175 best quality.

~~3176~~

3177 - `String = String`

~~3178~~

3179 - `Voice = :alloy | :ash | :ballad | 7 more`

~~3180~~

3181 The voice the model uses to respond. Voice cannot be changed during the

3182 session once the model has responded with audio at least once. Current

3183 voice options are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`,

3184 `shimmer`, `verse`, `marin`, and `cedar`. We recommend `marin` and `cedar` for

3185 best quality.

~~3186~~

3187 - `:alloy`

~~3188~~

3189 - `:ash`

~~3190~~

3191 - `:ballad`

~~3192~~

3193 - `:coral`

~~3194~~

3195 - `:echo`

~~3196~~

3197 - `:sage`

~~3198~~

3199 - `:shimmer`

~~3200~~

3201 - `:verse`

~~3202~~

3203 - `:marin`

~~3204~~

3205 - `:cedar`

~~3206~~

3207 - `expires_at: Integer`

~~3208~~

3209 Expiration timestamp for the session, in seconds since epoch.

~~3210~~

3211 - `include: Array[:"item.input_audio_transcription.logprobs"]`

~~3212~~

3213 Additional fields to include in server outputs.

~~3214~~

3215 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~3216~~

3217 - `:"item.input_audio_transcription.logprobs"`

~~3218~~

3219 - `instructions: String`

~~3220~~

3221 The default system instructions (i.e. system message) prepended to model calls. This field allows the client to guide the model on desired responses. The model can be instructed on response content and format, (e.g. "be extremely succinct", "act friendly", "here are examples of good responses") and on audio behavior (e.g. "talk quickly", "inject emotion into your voice", "laugh frequently"). The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behavior.

~~3222~~

3223 Note that the server sets default instructions which will be used if this field is not set and are visible in the `session.created` event at the start of the session.

~~3224~~

3225 - `max_output_tokens: Integer | :inf`

~~3226~~

3227 Maximum number of output tokens for a single assistant response,

3228 inclusive of tool calls. Provide an integer between 1 and 4096 to

3229 limit output tokens, or `inf` for the maximum available tokens for a

3230 given model. Defaults to `inf`.

~~3231~~

3232 - `Integer = Integer`

~~3233~~

3234 - `MaxOutputTokens = :inf`

~~3235~~

3236 - `:inf`

~~3237~~

3238 - `model: String | :"gpt-realtime" | :"gpt-realtime-1.5" | :"gpt-realtime-2" | 14 more`

~~3239~~

3240 The Realtime model used for this session.

~~3241~~

3242 - `String = String`

~~3243~~

3244 - `Model = :"gpt-realtime" | :"gpt-realtime-1.5" | :"gpt-realtime-2" | 14 more`

~~3245~~

3246 The Realtime model used for this session.

~~3247~~

3248 - `:"gpt-realtime"`

~~3249~~

3250 - `:"gpt-realtime-1.5"`

~~3251~~

3252 - `:"gpt-realtime-2"`

~~3253~~

3254 - `:"gpt-realtime-2025-08-28"`

~~3255~~

3256 - `:"gpt-4o-realtime-preview"`

~~3257~~

3258 - `:"gpt-4o-realtime-preview-2024-10-01"`

~~3259~~

3260 - `:"gpt-4o-realtime-preview-2024-12-17"`

~~3261~~

3262 - `:"gpt-4o-realtime-preview-2025-06-03"`

~~3263~~

3264 - `:"gpt-4o-mini-realtime-preview"`

~~3265~~

3266 - `:"gpt-4o-mini-realtime-preview-2024-12-17"`

~~3267~~

3268 - `:"gpt-realtime-mini"`

~~3269~~

3270 - `:"gpt-realtime-mini-2025-10-06"`

~~3271~~

3272 - `:"gpt-realtime-mini-2025-12-15"`

~~3273~~

3274 - `:"gpt-audio-1.5"`

~~3275~~

3276 - `:"gpt-audio-mini"`

~~3277~~

3278 - `:"gpt-audio-mini-2025-10-06"`

~~3279~~

3280 - `:"gpt-audio-mini-2025-12-15"`

~~3281~~

3282 - `output_modalities: Array[:text | :audio]`

~~3283~~

3284 The set of modalities the model can respond with. It defaults to `["audio"]`, indicating

3285 that the model will respond with audio plus a transcript. `["text"]` can be used to make

3286 the model respond with text only. It is not possible to request both `text` and `audio` at the same time.

~~3287~~

3288 - `:text`

~~3289~~

3290 - `:audio`

~~3291~~

3292 - `prompt: ResponsePrompt`

~~3293~~

3294 Reference to a prompt template and its variables.

3295 [Learn more](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts).

~~3296~~

3297 - `id: String`

~~3298~~

3299 The unique identifier of the prompt template to use.

~~3300~~

3301 - `variables: Hash[Symbol, String | ResponseInputText | ResponseInputImage | ResponseInputFile]`

~~3302~~

3303 Optional map of values to substitute in for variables in your

3304 prompt. The substitution values can either be strings, or other

3305 Response input types like images or files.

~~3306~~

3307 - `String = String`

~~3308~~

3309 - `class ResponseInputText`

~~3310~~

3311 A text input to the model.

~~3312~~

3313 - `text: String`

~~3314~~

3315 The text input to the model.

~~3316~~

3317 - `type: :input_text`

~~3318~~

3319 The type of the input item. Always `input_text`.

~~3320~~

3321 - `:input_text`

~~3322~~

3323 - `class ResponseInputImage`

~~3324~~

3325 An image input to the model. Learn about [image inputs](https://platform.openai.com/docs/guides/vision).

~~3326~~

3327 - `detail: :low | :high | :auto | :original`

~~3328~~

3329 The detail level of the image to be sent to the model. One of `high`, `low`, `auto`, or `original`. Defaults to `auto`.

~~3330~~

3331 - `:low`

~~3332~~

3333 - `:high`

~~3334~~

3335 - `:auto`

~~3336~~

3337 - `:original`

~~3338~~

3339 - `type: :input_image`

~~3340~~

3341 The type of the input item. Always `input_image`.

~~3342~~

3343 - `:input_image`

~~3344~~

3345 - `file_id: String`

~~3346~~

3347 The ID of the file to be sent to the model.

~~3348~~

3349 - `image_url: String`

~~3350~~

3351 The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.

~~3352~~

3353 - `class ResponseInputFile`

~~3354~~

3355 A file input to the model.

~~3356~~

3357 - `type: :input_file`

~~3358~~

3359 The type of the input item. Always `input_file`.

~~3360~~

3361 - `:input_file`

~~3362~~

3363 - `detail: :low | :high`

~~3364~~

3365 The detail level of the file to be sent to the model. Use `low` for the default rendering behavior, or `high` to render the file at higher quality. Defaults to `low`.

~~3366~~

3367 - `:low`

~~3368~~

3369 - `:high`

~~3370~~

3371 - `file_data: String`

~~3372~~

3373 The content of the file to be sent to the model.

~~3374~~

3375 - `file_id: String`

~~3376~~

3377 The ID of the file to be sent to the model.

~~3378~~

3379 - `file_url: String`

~~3380~~

3381 The URL of the file to be sent to the model.

~~3382~~

3383 - `filename: String`

~~3384~~

3385 The name of the file to be sent to the model.

~~3386~~

3387 - `version: String`

~~3388~~

3389 Optional version of the prompt template.

~~3390~~

3391 - `reasoning: RealtimeReasoning`

~~3392~~

3393 Configuration for reasoning-capable Realtime models such as `gpt-realtime-2`.

~~3394~~

3395 - `effort: RealtimeReasoningEffort`

~~3396~~

3397 Constrains effort on reasoning for reasoning-capable Realtime models such as

3398 `gpt-realtime-2`.

~~3399~~

3400 - `:minimal`

~~3401~~

3402 - `:low`

~~3403~~

3404 - `:medium`

~~3405~~

3406 - `:high`

~~3407~~

3408 - `:xhigh`

~~3409~~

3410 - `tool_choice: ToolChoiceOptions | ToolChoiceFunction | ToolChoiceMcp`

~~3411~~

3412 How the model chooses tools. Provide one of the string modes or force a specific

3413 function/MCP tool.

~~3414~~

3415 - `ToolChoiceOptions = :none | :auto | :required`

~~3416~~

3417 Controls which (if any) tool is called by the model.

~~3418~~

3419 `none` means the model will not call any tool and instead generates a message.

~~3420~~

3421 `auto` means the model can pick between generating a message or calling one or

3422 more tools.

~~3423~~

3424 `required` means the model must call one or more tools.

~~3425~~

3426 - `:none`

~~3427~~

3428 - `:auto`

~~3429~~

3430 - `:required`

~~3431~~

3432 - `class ToolChoiceFunction`

~~3433~~

3434 Use this option to force the model to call a specific function.

~~3435~~

3436 - `name: String`

~~3437~~

3438 The name of the function to call.

~~3439~~

3440 - `type: :function`

~~3441~~

3442 For function calling, the type is always `function`.

~~3443~~

3444 - `:function`

~~3445~~

3446 - `class ToolChoiceMcp`

~~3447~~

3448 Use this option to force the model to call a specific tool on a remote MCP server.

~~3449~~

3450 - `server_label: String`

~~3451~~

3452 The label of the MCP server to use.

~~3453~~

3454 - `type: :mcp`

~~3455~~

3456 For MCP tools, the type is always `mcp`.

~~3457~~

3458 - `:mcp`

~~3459~~

3460 - `name: String`

~~3461~~

3462 The name of the tool to call on the server.

~~3463~~

3464 - `tools: Array[RealtimeFunctionTool | McpTool{ server_label, type, allowed_tools, 7 more}]`

~~3465~~

3466 Tools available to the model.

~~3467~~

3468 - `class RealtimeFunctionTool`

~~3469~~

3470 - `description: String`

~~3471~~

3472 The description of the function, including guidance on when and how

3473 to call it, and guidance about what to tell the user when calling

3474 (if anything).

~~3475~~

3476 - `name: String`

~~3477~~

3478 The name of the function.

~~3479~~

3480 - `parameters: untyped`

~~3481~~

3482 Parameters of the function in JSON Schema.

~~3483~~

3484 - `type: :function`

~~3485~~

3486 The type of the tool, i.e. `function`.

~~3487~~

3488 - `:function`

~~3489~~

3490 - `class McpTool`

~~3491~~

3492 Give the model access to additional tools via remote Model Context Protocol

3493 (MCP) servers. [Learn more about MCP](https://platform.openai.com/docs/guides/tools-remote-mcp).

~~3494~~

3495 - `server_label: String`

~~3496~~

3497 A label for this MCP server, used to identify it in tool calls.

~~3498~~

3499 - `type: :mcp`

~~3500~~

3501 The type of the MCP tool. Always `mcp`.

~~3502~~

3503 - `:mcp`

~~3504~~

3505 - `allowed_tools: Array[String] | McpToolFilter{ read_only, tool_names}`

~~3506~~

3507 List of allowed tool names or a filter object.

~~3508~~

3509 - `McpAllowedTools = Array[String]`

~~3510~~

3511 A string array of allowed tool names

~~3512~~

3513 - `class McpToolFilter`

~~3514~~

3515 A filter object to specify which tools are allowed.

~~3516~~

3517 - `read_only: bool`

~~3518~~

3519 Indicates whether or not a tool modifies data or is read-only. If an

3520 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

3521 it will match this filter.

~~3522~~

3523 - `tool_names: Array[String]`

~~3524~~

3525 List of allowed tool names.

~~3526~~

3527 - `authorization: String`

~~3528~~

3529 An OAuth access token that can be used with a remote MCP server, either

3530 with a custom MCP server URL or a service connector. Your application

3531 must handle the OAuth authorization flow and provide the token here.

~~3532~~

3533 - `connector_id: :connector_dropbox | :connector_gmail | :connector_googlecalendar | 5 more`

~~3534~~

3535 Identifier for service connectors, like those available in ChatGPT. One of

3536 `server_url` or `connector_id` must be provided. Learn more about service

3537 connectors [here](https://platform.openai.com/docs/guides/tools-remote-mcp#connectors).

~~3538~~

3539 Currently supported `connector_id` values are:

~~3540~~

3541 - Dropbox: `connector_dropbox`

3542 - Gmail: `connector_gmail`

3543 - Google Calendar: `connector_googlecalendar`

3544 - Google Drive: `connector_googledrive`

3545 - Microsoft Teams: `connector_microsoftteams`

3546 - Outlook Calendar: `connector_outlookcalendar`

3547 - Outlook Email: `connector_outlookemail`

3548 - SharePoint: `connector_sharepoint`

~~3549~~

3550 - `:connector_dropbox`

~~3551~~

3552 - `:connector_gmail`

~~3553~~

3554 - `:connector_googlecalendar`

~~3555~~

3556 - `:connector_googledrive`

~~3557~~

3558 - `:connector_microsoftteams`

~~3559~~

3560 - `:connector_outlookcalendar`

~~3561~~

3562 - `:connector_outlookemail`

~~3563~~

3564 - `:connector_sharepoint`

~~3565~~

3566 - `defer_loading: bool`

~~3567~~

3568 Whether this MCP tool is deferred and discovered via tool search.

~~3569~~

3570 - `headers: Hash[Symbol, String]`

~~3571~~

3572 Optional HTTP headers to send to the MCP server. Use for authentication

3573 or other purposes.

~~3574~~

3575 - `require_approval: McpToolApprovalFilter{ always, never} | :always | :never`

~~3576~~

3577 Specify which of the MCP server's tools require approval.

~~3578~~

3579 - `class McpToolApprovalFilter`

~~3580~~

3581 Specify which of the MCP server's tools require approval. Can be

3582 `always`, `never`, or a filter object associated with tools

3583 that require approval.

~~3584~~

3585 - `always: Always{ read_only, tool_names}`

~~3586~~

3587 A filter object to specify which tools are allowed.

~~3588~~

3589 - `read_only: bool`

~~3590~~

3591 Indicates whether or not a tool modifies data or is read-only. If an

3592 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

3593 it will match this filter.

~~3594~~

3595 - `tool_names: Array[String]`

~~3596~~

3597 List of allowed tool names.

~~3598~~

3599 - `never: Never{ read_only, tool_names}`

~~3600~~

3601 A filter object to specify which tools are allowed.

~~3602~~

3603 - `read_only: bool`

~~3604~~

3605 Indicates whether or not a tool modifies data or is read-only. If an

3606 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),

3607 it will match this filter.

~~3608~~

3609 - `tool_names: Array[String]`

~~3610~~

3611 List of allowed tool names.

~~3612~~

3613 - `McpToolApprovalSetting = :always | :never`

~~3614~~

3615 Specify a single approval policy for all tools. One of `always` or

3616 `never`. When set to `always`, all tools will require approval. When

3617 set to `never`, all tools will not require approval.

~~3618~~

3619 - `:always`

~~3620~~

3621 - `:never`

~~3622~~

3623 - `server_description: String`

~~3624~~

3625 Optional description of the MCP server, used to provide more context.

~~3626~~

3627 - `server_url: String`

~~3628~~

3629 The URL for the MCP server. One of `server_url` or `connector_id` must be

3630 provided.

~~3631~~

3632 - `tracing: :auto | TracingConfiguration{ group_id, metadata, workflow_name}`

~~3633~~

3634 Realtime API can write session traces to the [Traces Dashboard](https://platform.openai.com/logs?api=traces). Set to null to disable tracing. Once

3635 tracing is enabled for a session, the configuration cannot be modified.

~~3636~~

3637 `auto` will create a trace for the session with default values for the

3638 workflow name, group id, and metadata.

~~3639~~

3640 - `Tracing = :auto`

~~3641~~

3642 Enables tracing and sets default values for tracing configuration options. Always `auto`.

~~3643~~

3644 - `:auto`

~~3645~~

3646 - `class TracingConfiguration`

~~3647~~

3648 Granular configuration for tracing.

~~3649~~

3650 - `group_id: String`

~~3651~~

3652 The group id to attach to this trace to enable filtering and

3653 grouping in the Traces Dashboard.

~~3654~~

3655 - `metadata: untyped`

~~3656~~

3657 The arbitrary metadata to attach to this trace to enable

3658 filtering in the Traces Dashboard.

~~3659~~

3660 - `workflow_name: String`

~~3661~~

3662 The name of the workflow to attach to this trace. This is used to

3663 name the trace in the Traces Dashboard.

~~3664~~

3665 - `truncation: RealtimeTruncation`

~~3666~~

3667 When the number of tokens in a conversation exceeds the model's input token limit, the conversation be truncated, meaning messages (starting from the oldest) will not be included in the model's context. A 32k context model with 4,096 max output tokens can only include 28,224 tokens in the context before truncation occurs.

~~3668~~

3669 Clients can configure truncation behavior to truncate with a lower max token limit, which is an effective way to control token usage and cost.

~~3670~~

3671 Truncation will reduce the number of cached tokens on the next turn (busting the cache), since messages are dropped from the beginning of the context. However, clients can also configure truncation to retain messages up to a fraction of the maximum context size, which will reduce the need for future truncations and thus improve the cache rate.

~~3672~~

3673 Truncation can be disabled entirely, which means the server will never truncate but would instead return an error if the conversation exceeds the model's input token limit.

~~3674~~

3675 - `RealtimeTruncationStrategy = :auto | :disabled`

~~3676~~

3677 The truncation strategy to use for the session. `auto` is the default truncation strategy. `disabled` will disable truncation and emit errors when the conversation exceeds the input token limit.

~~3678~~

3679 - `:auto`

~~3680~~

3681 - `:disabled`

~~3682~~

3683 - `class RealtimeTruncationRetentionRatio`

~~3684~~

3685 Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.

~~3686~~

3687 - `retention_ratio: Float`

~~3688~~

3689 Fraction of post-instruction conversation tokens to retain (`0.0` - `1.0`) when the conversation exceeds the input token limit. Setting this to `0.8` means that messages will be dropped until 80% of the maximum allowed tokens are used. This helps reduce the frequency of truncations and improve cache rates.

~~3690~~

3691 - `type: :retention_ratio`

~~3692~~

3693 Use retention ratio truncation.

~~3694~~

3695 - `:retention_ratio`

~~3696~~

3697 - `token_limits: TokenLimits{ post_instructions}`

~~3698~~

3699 Optional custom token limits for this truncation strategy. If not provided, the model's default token limits will be used.

~~3700~~

3701 - `post_instructions: Integer`

~~3702~~

3703 Maximum tokens allowed in the conversation after instructions (which including tool definitions). For example, setting this to 5,000 would mean that truncation would occur when the conversation exceeds 5,000 tokens after instructions. This cannot be higher than the model's context window size minus the maximum output tokens.

~~3704~~

3705 - `class RealtimeTranscriptionSessionCreateResponse`

~~3706~~

3707 A Realtime transcription session configuration object.

~~3708~~

3709 - `id: String`

~~3710~~

3711 Unique identifier for the session that looks like `sess_1234567890abcdef`.

~~3712~~

3713 - `object: String`

~~3714~~

3715 The object type. Always `realtime.transcription_session`.

~~3716~~

3717 - `type: :transcription`

~~3718~~

3719 The type of session. Always `transcription` for transcription sessions.

~~3720~~

3721 - `:transcription`

~~3722~~

3723 - `audio: Audio{ input}`

~~3724~~

3725 Configuration for input audio for the session.

~~3726~~

3727 - `input: Input{ format_, noise_reduction, transcription, turn_detection}`

~~3728~~

3729 - `format_: RealtimeAudioFormats`

~~3730~~

3731 The PCM audio format. Only a 24kHz sample rate is supported.

~~3732~~

3733 - `noise_reduction: NoiseReduction{ type}`

~~3734~~

3735 Configuration for input audio noise reduction.

~~3736~~

3737 - `type: NoiseReductionType`

~~3738~~

3739 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.

~~3740~~

3741 - `transcription: AudioTranscription`

~~3742~~

3743 - `turn_detection: RealtimeTranscriptionSessionTurnDetection`

~~3744~~

3745 Configuration for turn detection. Can be set to `null` to turn off. Server

3746 VAD means that the model will detect the start and end of speech based on

3747 audio volume and respond at the end of user speech. For `gpt-realtime-whisper`, this must be `null`; VAD is not supported.

~~3748~~

3749 - `prefix_padding_ms: Integer`

~~3750~~

3751 Amount of audio to include before the VAD detected speech (in

3752 milliseconds). Defaults to 300ms.

~~3753~~

3754 - `silence_duration_ms: Integer`

~~3755~~

3756 Duration of silence to detect speech stop (in milliseconds). Defaults

3757 to 500ms. With shorter values the model will respond more quickly,

3758 but may jump in on short pauses from the user.

~~3759~~

3760 - `threshold: Float`

~~3761~~

3762 Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A

3763 higher threshold will require louder audio to activate the model, and

3764 thus might perform better in noisy environments.

~~3765~~

3766 - `type: String`

~~3767~~

3768 Type of turn detection, only `server_vad` is currently supported.

~~3769~~

3770 - `expires_at: Integer`

~~3771~~

3772 Expiration timestamp for the session, in seconds since epoch.

~~3773~~

3774 - `include: Array[:"item.input_audio_transcription.logprobs"]`

~~3775~~

3776 Additional fields to include in server outputs.

~~3777~~

3778 - `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.

~~3779~~

3780 - `:"item.input_audio_transcription.logprobs"`

~~3781~~

3782 - `value: String`

~~3783~~

3784 The generated client secret value.