ruby/resources/realtime/subresources/client_secrets/index.md +0 −3784 deleted
File Deleted View Diff
1# Client Secrets
2
3## Create client secret
4
5`realtime.client_secrets.create(**kwargs) -> ClientSecretCreateResponse`
6
7**post** `/realtime/client_secrets`
8
9Create a Realtime client secret with an associated session configuration.
10
11Client secrets are short-lived tokens that can be passed to a client app,
12such as a web frontend or mobile client, which grants access to the Realtime API without
13leaking your main API key. You can configure a custom TTL for each client secret.
14
15You can also attach session configuration options to the client secret, which will be
16applied to any sessions created using that client secret, but these can also be overridden
17by the client connection.
18
19[Learn more about authentication with client secrets over WebRTC](https://platform.openai.com/docs/guides/realtime-webrtc).
20
21Returns the created client secret and the effective session object. The client secret is a string that looks like `ek_1234`.
22
23### Parameters
24
25- `expires_after: ExpiresAfter{ anchor, seconds}`
26
27 Configuration for the client secret expiration. Expiration refers to the time after which
28 a client secret will no longer be valid for creating sessions. The session itself may
29 continue after that time once started. A secret can be used to create multiple sessions
30 until it expires.
31
32 - `anchor: :created_at`
33
34 The anchor point for the client secret expiration, meaning that `seconds` will be added to the `created_at` time of the client secret to produce an expiration timestamp. Only `created_at` is currently supported.
35
36 - `:created_at`
37
38 - `seconds: Integer`
39
40 The number of seconds from the anchor point to the expiration. Select a value between `10` and `7200` (2 hours). This default to 600 seconds (10 minutes) if not specified.
41
42- `session: RealtimeSessionCreateRequest | RealtimeTranscriptionSessionCreateRequest`
43
44 Session configuration to use for the client secret. Choose either a realtime
45 session or a transcription session.
46
47 - `class RealtimeSessionCreateRequest`
48
49 Realtime session object configuration.
50
51 - `type: :realtime`
52
53 The type of session to create. Always `realtime` for the Realtime API.
54
55 - `:realtime`
56
57 - `audio: RealtimeAudioConfig`
58
59 Configuration for input and output audio.
60
61 - `input: RealtimeAudioConfigInput`
62
63 - `format_: RealtimeAudioFormats`
64
65 The format of the input audio.
66
67 - `class AudioPCM`
68
69 The PCM audio format. Only a 24kHz sample rate is supported.
70
71 - `rate: 24000`
72
73 The sample rate of the audio. Always `24000`.
74
75 - `24000`
76
77 - `type: :"audio/pcm"`
78
79 The audio format. Always `audio/pcm`.
80
81 - `:"audio/pcm"`
82
83 - `class AudioPCMU`
84
85 The G.711 μ-law format.
86
87 - `type: :"audio/pcmu"`
88
89 The audio format. Always `audio/pcmu`.
90
91 - `:"audio/pcmu"`
92
93 - `class AudioPCMA`
94
95 The G.711 A-law format.
96
97 - `type: :"audio/pcma"`
98
99 The audio format. Always `audio/pcma`.
100
101 - `:"audio/pcma"`
102
103 - `noise_reduction: NoiseReduction{ type}`
104
105 Configuration for input audio noise reduction. This can be set to `null` to turn off.
106 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.
107 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.
108
109 - `type: NoiseReductionType`
110
111 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.
112
113 - `:near_field`
114
115 - `:far_field`
116
117 - `transcription: AudioTranscription`
118
119 Configuration for input audio transcription, defaults to off and can be set to `null` to turn off once on. Input audio transcription is not native to the model, since the model consumes audio directly. Transcription runs asynchronously through [the /audio/transcriptions endpoint](https://platform.openai.com/docs/api-reference/audio/createTranscription) and should be treated as guidance of input audio content rather than precisely what the model heard. The client can optionally set the language and prompt for transcription, these offer additional guidance to the transcription service.
120
121 - `delay: :minimal | :low | :medium | 2 more`
122
123 Controls how long the model waits before emitting transcription text.
124 Higher values can improve transcription accuracy at the cost of latency.
125 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.
126
127 - `:minimal`
128
129 - `:low`
130
131 - `:medium`
132
133 - `:high`
134
135 - `:xhigh`
136
137 - `language: String`
138
139 The language of the input audio. Supplying the input language in
140 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format
141 will improve accuracy and latency.
142
143 - `model: String | :"whisper-1" | :"gpt-4o-mini-transcribe" | :"gpt-4o-mini-transcribe-2025-12-15" | 3 more`
144
145 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.
146
147 - `String = String`
148
149 - `Model = :"whisper-1" | :"gpt-4o-mini-transcribe" | :"gpt-4o-mini-transcribe-2025-12-15" | 3 more`
150
151 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.
152
153 - `:"whisper-1"`
154
155 - `:"gpt-4o-mini-transcribe"`
156
157 - `:"gpt-4o-mini-transcribe-2025-12-15"`
158
159 - `:"gpt-4o-transcribe"`
160
161 - `:"gpt-4o-transcribe-diarize"`
162
163 - `:"gpt-realtime-whisper"`
164
165 - `prompt: String`
166
167 An optional text to guide the model's style or continue a previous audio
168 segment.
169 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).
170 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".
171 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.
172
173 - `turn_detection: RealtimeAudioInputTurnDetection`
174
175 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.
176
177 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.
178
179 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.
180
181 For `gpt-realtime-whisper` transcription sessions, turn detection must be
182 set to `null`; VAD is not supported.
183
184 - `class ServerVad`
185
186 Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.
187
188 - `type: :server_vad`
189
190 Type of turn detection, `server_vad` to turn on simple Server VAD.
191
192 - `:server_vad`
193
194 - `create_response: bool`
195
196 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.
197
198 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.
199
200 - `idle_timeout_ms: Integer`
201
202 Optional timeout after which a model response will be triggered automatically. This is
203 useful for situations in which a long pause from the user is unexpected, such as a phone
204 call. The model will effectively prompt the user to continue the conversation based
205 on the current context.
206
207 The timeout value will be applied after the last model response's audio has finished playing,
208 i.e. it's set to the `response.done` time plus audio playback duration.
209
210 An `input_audio_buffer.timeout_triggered` event (plus events
211 associated with the Response) will be emitted when the timeout is reached.
212 Idle timeout is currently only supported for `server_vad` mode.
213
214 - `interrupt_response: bool`
215
216 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default
217 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.
218
219 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.
220
221 - `prefix_padding_ms: Integer`
222
223 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in
224 milliseconds). Defaults to 300ms.
225
226 - `silence_duration_ms: Integer`
227
228 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults
229 to 500ms. With shorter values the model will respond more quickly,
230 but may jump in on short pauses from the user.
231
232 - `threshold: Float`
233
234 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A
235 higher threshold will require louder audio to activate the model, and
236 thus might perform better in noisy environments.
237
238 - `class SemanticVad`
239
240 Server-side semantic turn detection which uses a model to determine when the user has finished speaking.
241
242 - `type: :semantic_vad`
243
244 Type of turn detection, `semantic_vad` to turn on Semantic VAD.
245
246 - `:semantic_vad`
247
248 - `create_response: bool`
249
250 Whether or not to automatically generate a response when a VAD stop event occurs.
251
252 - `eagerness: :low | :medium | :high | :auto`
253
254 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.
255
256 - `:low`
257
258 - `:medium`
259
260 - `:high`
261
262 - `:auto`
263
264 - `interrupt_response: bool`
265
266 Whether or not to automatically interrupt any ongoing response with output to the default
267 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.
268
269 - `output: RealtimeAudioConfigOutput`
270
271 - `format_: RealtimeAudioFormats`
272
273 The format of the output audio.
274
275 - `speed: Float`
276
277 The speed of the model's spoken response as a multiple of the original speed.
278 1.0 is the default speed. 0.25 is the minimum speed. 1.5 is the maximum speed. This value can only be changed in between model turns, not while a response is in progress.
279
280 This parameter is a post-processing adjustment to the audio after it is generated, it's
281 also possible to prompt the model to speak faster or slower.
282
283 - `voice: String | :alloy | :ash | :ballad | 7 more | ID{ id}`
284
285 The voice the model uses to respond. Supported built-in voices are
286 `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`, `shimmer`, `verse`,
287 `marin`, and `cedar`. You may also provide a custom voice object with
288 an `id`, for example `{ "id": "voice_1234" }`. Voice cannot be changed
289 during the session once the model has responded with audio at least once.
290 We recommend `marin` and `cedar` for best quality.
291
292 - `String = String`
293
294 - `Voice = :alloy | :ash | :ballad | 7 more`
295
296 - `:alloy`
297
298 - `:ash`
299
300 - `:ballad`
301
302 - `:coral`
303
304 - `:echo`
305
306 - `:sage`
307
308 - `:shimmer`
309
310 - `:verse`
311
312 - `:marin`
313
314 - `:cedar`
315
316 - `class ID`
317
318 Custom voice reference.
319
320 - `id: String`
321
322 The custom voice ID, e.g. `voice_1234`.
323
324 - `include: Array[:"item.input_audio_transcription.logprobs"]`
325
326 Additional fields to include in server outputs.
327
328 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.
329
330 - `:"item.input_audio_transcription.logprobs"`
331
332 - `instructions: String`
333
334 The default system instructions (i.e. system message) prepended to model calls. This field allows the client to guide the model on desired responses. The model can be instructed on response content and format, (e.g. "be extremely succinct", "act friendly", "here are examples of good responses") and on audio behavior (e.g. "talk quickly", "inject emotion into your voice", "laugh frequently"). The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behavior.
335
336 Note that the server sets default instructions which will be used if this field is not set and are visible in the `session.created` event at the start of the session.
337
338 - `max_output_tokens: Integer | :inf`
339
340 Maximum number of output tokens for a single assistant response,
341 inclusive of tool calls. Provide an integer between 1 and 4096 to
342 limit output tokens, or `inf` for the maximum available tokens for a
343 given model. Defaults to `inf`.
344
345 - `Integer = Integer`
346
347 - `MaxOutputTokens = :inf`
348
349 - `:inf`
350
351 - `model: String | :"gpt-realtime" | :"gpt-realtime-1.5" | :"gpt-realtime-2" | 14 more`
352
353 The Realtime model used for this session.
354
355 - `String = String`
356
357 - `Model = :"gpt-realtime" | :"gpt-realtime-1.5" | :"gpt-realtime-2" | 14 more`
358
359 The Realtime model used for this session.
360
361 - `:"gpt-realtime"`
362
363 - `:"gpt-realtime-1.5"`
364
365 - `:"gpt-realtime-2"`
366
367 - `:"gpt-realtime-2025-08-28"`
368
369 - `:"gpt-4o-realtime-preview"`
370
371 - `:"gpt-4o-realtime-preview-2024-10-01"`
372
373 - `:"gpt-4o-realtime-preview-2024-12-17"`
374
375 - `:"gpt-4o-realtime-preview-2025-06-03"`
376
377 - `:"gpt-4o-mini-realtime-preview"`
378
379 - `:"gpt-4o-mini-realtime-preview-2024-12-17"`
380
381 - `:"gpt-realtime-mini"`
382
383 - `:"gpt-realtime-mini-2025-10-06"`
384
385 - `:"gpt-realtime-mini-2025-12-15"`
386
387 - `:"gpt-audio-1.5"`
388
389 - `:"gpt-audio-mini"`
390
391 - `:"gpt-audio-mini-2025-10-06"`
392
393 - `:"gpt-audio-mini-2025-12-15"`
394
395 - `output_modalities: Array[:text | :audio]`
396
397 The set of modalities the model can respond with. It defaults to `["audio"]`, indicating
398 that the model will respond with audio plus a transcript. `["text"]` can be used to make
399 the model respond with text only. It is not possible to request both `text` and `audio` at the same time.
400
401 - `:text`
402
403 - `:audio`
404
405 - `parallel_tool_calls: bool`
406
407 Whether the model may call multiple tools in parallel. Only supported by
408 reasoning Realtime models such as `gpt-realtime-2`.
409
410 - `prompt: ResponsePrompt`
411
412 Reference to a prompt template and its variables.
413 [Learn more](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts).
414
415 - `id: String`
416
417 The unique identifier of the prompt template to use.
418
419 - `variables: Hash[Symbol, String | ResponseInputText | ResponseInputImage | ResponseInputFile]`
420
421 Optional map of values to substitute in for variables in your
422 prompt. The substitution values can either be strings, or other
423 Response input types like images or files.
424
425 - `String = String`
426
427 - `class ResponseInputText`
428
429 A text input to the model.
430
431 - `text: String`
432
433 The text input to the model.
434
435 - `type: :input_text`
436
437 The type of the input item. Always `input_text`.
438
439 - `:input_text`
440
441 - `class ResponseInputImage`
442
443 An image input to the model. Learn about [image inputs](https://platform.openai.com/docs/guides/vision).
444
445 - `detail: :low | :high | :auto | :original`
446
447 The detail level of the image to be sent to the model. One of `high`, `low`, `auto`, or `original`. Defaults to `auto`.
448
449 - `:low`
450
451 - `:high`
452
453 - `:auto`
454
455 - `:original`
456
457 - `type: :input_image`
458
459 The type of the input item. Always `input_image`.
460
461 - `:input_image`
462
463 - `file_id: String`
464
465 The ID of the file to be sent to the model.
466
467 - `image_url: String`
468
469 The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.
470
471 - `class ResponseInputFile`
472
473 A file input to the model.
474
475 - `type: :input_file`
476
477 The type of the input item. Always `input_file`.
478
479 - `:input_file`
480
481 - `detail: :low | :high`
482
483 The detail level of the file to be sent to the model. Use `low` for the default rendering behavior, or `high` to render the file at higher quality. Defaults to `low`.
484
485 - `:low`
486
487 - `:high`
488
489 - `file_data: String`
490
491 The content of the file to be sent to the model.
492
493 - `file_id: String`
494
495 The ID of the file to be sent to the model.
496
497 - `file_url: String`
498
499 The URL of the file to be sent to the model.
500
501 - `filename: String`
502
503 The name of the file to be sent to the model.
504
505 - `version: String`
506
507 Optional version of the prompt template.
508
509 - `reasoning: RealtimeReasoning`
510
511 Configuration for reasoning-capable Realtime models such as `gpt-realtime-2`.
512
513 - `effort: RealtimeReasoningEffort`
514
515 Constrains effort on reasoning for reasoning-capable Realtime models such as
516 `gpt-realtime-2`.
517
518 - `:minimal`
519
520 - `:low`
521
522 - `:medium`
523
524 - `:high`
525
526 - `:xhigh`
527
528 - `tool_choice: RealtimeToolChoiceConfig`
529
530 How the model chooses tools. Provide one of the string modes or force a specific
531 function/MCP tool.
532
533 - `ToolChoiceOptions = :none | :auto | :required`
534
535 Controls which (if any) tool is called by the model.
536
537 `none` means the model will not call any tool and instead generates a message.
538
539 `auto` means the model can pick between generating a message or calling one or
540 more tools.
541
542 `required` means the model must call one or more tools.
543
544 - `:none`
545
546 - `:auto`
547
548 - `:required`
549
550 - `class ToolChoiceFunction`
551
552 Use this option to force the model to call a specific function.
553
554 - `name: String`
555
556 The name of the function to call.
557
558 - `type: :function`
559
560 For function calling, the type is always `function`.
561
562 - `:function`
563
564 - `class ToolChoiceMcp`
565
566 Use this option to force the model to call a specific tool on a remote MCP server.
567
568 - `server_label: String`
569
570 The label of the MCP server to use.
571
572 - `type: :mcp`
573
574 For MCP tools, the type is always `mcp`.
575
576 - `:mcp`
577
578 - `name: String`
579
580 The name of the tool to call on the server.
581
582 - `tools: RealtimeToolsConfig`
583
584 Tools available to the model.
585
586 - `class RealtimeFunctionTool`
587
588 - `description: String`
589
590 The description of the function, including guidance on when and how
591 to call it, and guidance about what to tell the user when calling
592 (if anything).
593
594 - `name: String`
595
596 The name of the function.
597
598 - `parameters: untyped`
599
600 Parameters of the function in JSON Schema.
601
602 - `type: :function`
603
604 The type of the tool, i.e. `function`.
605
606 - `:function`
607
608 - `class Mcp`
609
610 Give the model access to additional tools via remote Model Context Protocol
611 (MCP) servers. [Learn more about MCP](https://platform.openai.com/docs/guides/tools-remote-mcp).
612
613 - `server_label: String`
614
615 A label for this MCP server, used to identify it in tool calls.
616
617 - `type: :mcp`
618
619 The type of the MCP tool. Always `mcp`.
620
621 - `:mcp`
622
623 - `allowed_tools: Array[String] | McpToolFilter{ read_only, tool_names}`
624
625 List of allowed tool names or a filter object.
626
627 - `McpAllowedTools = Array[String]`
628
629 A string array of allowed tool names
630
631 - `class McpToolFilter`
632
633 A filter object to specify which tools are allowed.
634
635 - `read_only: bool`
636
637 Indicates whether or not a tool modifies data or is read-only. If an
638 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),
639 it will match this filter.
640
641 - `tool_names: Array[String]`
642
643 List of allowed tool names.
644
645 - `authorization: String`
646
647 An OAuth access token that can be used with a remote MCP server, either
648 with a custom MCP server URL or a service connector. Your application
649 must handle the OAuth authorization flow and provide the token here.
650
651 - `connector_id: :connector_dropbox | :connector_gmail | :connector_googlecalendar | 5 more`
652
653 Identifier for service connectors, like those available in ChatGPT. One of
654 `server_url` or `connector_id` must be provided. Learn more about service
655 connectors [here](https://platform.openai.com/docs/guides/tools-remote-mcp#connectors).
656
657 Currently supported `connector_id` values are:
658
659 - Dropbox: `connector_dropbox`
660 - Gmail: `connector_gmail`
661 - Google Calendar: `connector_googlecalendar`
662 - Google Drive: `connector_googledrive`
663 - Microsoft Teams: `connector_microsoftteams`
664 - Outlook Calendar: `connector_outlookcalendar`
665 - Outlook Email: `connector_outlookemail`
666 - SharePoint: `connector_sharepoint`
667
668 - `:connector_dropbox`
669
670 - `:connector_gmail`
671
672 - `:connector_googlecalendar`
673
674 - `:connector_googledrive`
675
676 - `:connector_microsoftteams`
677
678 - `:connector_outlookcalendar`
679
680 - `:connector_outlookemail`
681
682 - `:connector_sharepoint`
683
684 - `defer_loading: bool`
685
686 Whether this MCP tool is deferred and discovered via tool search.
687
688 - `headers: Hash[Symbol, String]`
689
690 Optional HTTP headers to send to the MCP server. Use for authentication
691 or other purposes.
692
693 - `require_approval: McpToolApprovalFilter{ always, never} | :always | :never`
694
695 Specify which of the MCP server's tools require approval.
696
697 - `class McpToolApprovalFilter`
698
699 Specify which of the MCP server's tools require approval. Can be
700 `always`, `never`, or a filter object associated with tools
701 that require approval.
702
703 - `always: Always{ read_only, tool_names}`
704
705 A filter object to specify which tools are allowed.
706
707 - `read_only: bool`
708
709 Indicates whether or not a tool modifies data or is read-only. If an
710 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),
711 it will match this filter.
712
713 - `tool_names: Array[String]`
714
715 List of allowed tool names.
716
717 - `never: Never{ read_only, tool_names}`
718
719 A filter object to specify which tools are allowed.
720
721 - `read_only: bool`
722
723 Indicates whether or not a tool modifies data or is read-only. If an
724 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),
725 it will match this filter.
726
727 - `tool_names: Array[String]`
728
729 List of allowed tool names.
730
731 - `McpToolApprovalSetting = :always | :never`
732
733 Specify a single approval policy for all tools. One of `always` or
734 `never`. When set to `always`, all tools will require approval. When
735 set to `never`, all tools will not require approval.
736
737 - `:always`
738
739 - `:never`
740
741 - `server_description: String`
742
743 Optional description of the MCP server, used to provide more context.
744
745 - `server_url: String`
746
747 The URL for the MCP server. One of `server_url` or `connector_id` must be
748 provided.
749
750 - `tracing: RealtimeTracingConfig`
751
752 Realtime API can write session traces to the [Traces Dashboard](https://platform.openai.com/logs?api=traces). Set to null to disable tracing. Once
753 tracing is enabled for a session, the configuration cannot be modified.
754
755 `auto` will create a trace for the session with default values for the
756 workflow name, group id, and metadata.
757
758 - `RealtimeTracingConfig = :auto`
759
760 Enables tracing and sets default values for tracing configuration options. Always `auto`.
761
762 - `:auto`
763
764 - `class TracingConfiguration`
765
766 Granular configuration for tracing.
767
768 - `group_id: String`
769
770 The group id to attach to this trace to enable filtering and
771 grouping in the Traces Dashboard.
772
773 - `metadata: untyped`
774
775 The arbitrary metadata to attach to this trace to enable
776 filtering in the Traces Dashboard.
777
778 - `workflow_name: String`
779
780 The name of the workflow to attach to this trace. This is used to
781 name the trace in the Traces Dashboard.
782
783 - `truncation: RealtimeTruncation`
784
785 When the number of tokens in a conversation exceeds the model's input token limit, the conversation be truncated, meaning messages (starting from the oldest) will not be included in the model's context. A 32k context model with 4,096 max output tokens can only include 28,224 tokens in the context before truncation occurs.
786
787 Clients can configure truncation behavior to truncate with a lower max token limit, which is an effective way to control token usage and cost.
788
789 Truncation will reduce the number of cached tokens on the next turn (busting the cache), since messages are dropped from the beginning of the context. However, clients can also configure truncation to retain messages up to a fraction of the maximum context size, which will reduce the need for future truncations and thus improve the cache rate.
790
791 Truncation can be disabled entirely, which means the server will never truncate but would instead return an error if the conversation exceeds the model's input token limit.
792
793 - `RealtimeTruncationStrategy = :auto | :disabled`
794
795 The truncation strategy to use for the session. `auto` is the default truncation strategy. `disabled` will disable truncation and emit errors when the conversation exceeds the input token limit.
796
797 - `:auto`
798
799 - `:disabled`
800
801 - `class RealtimeTruncationRetentionRatio`
802
803 Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.
804
805 - `retention_ratio: Float`
806
807 Fraction of post-instruction conversation tokens to retain (`0.0` - `1.0`) when the conversation exceeds the input token limit. Setting this to `0.8` means that messages will be dropped until 80% of the maximum allowed tokens are used. This helps reduce the frequency of truncations and improve cache rates.
808
809 - `type: :retention_ratio`
810
811 Use retention ratio truncation.
812
813 - `:retention_ratio`
814
815 - `token_limits: TokenLimits{ post_instructions}`
816
817 Optional custom token limits for this truncation strategy. If not provided, the model's default token limits will be used.
818
819 - `post_instructions: Integer`
820
821 Maximum tokens allowed in the conversation after instructions (which including tool definitions). For example, setting this to 5,000 would mean that truncation would occur when the conversation exceeds 5,000 tokens after instructions. This cannot be higher than the model's context window size minus the maximum output tokens.
822
823 - `class RealtimeTranscriptionSessionCreateRequest`
824
825 Realtime transcription session object configuration.
826
827 - `type: :transcription`
828
829 The type of session to create. Always `transcription` for transcription sessions.
830
831 - `:transcription`
832
833 - `audio: RealtimeTranscriptionSessionAudio`
834
835 Configuration for input and output audio.
836
837 - `input: RealtimeTranscriptionSessionAudioInput`
838
839 - `format_: RealtimeAudioFormats`
840
841 The PCM audio format. Only a 24kHz sample rate is supported.
842
843 - `noise_reduction: NoiseReduction{ type}`
844
845 Configuration for input audio noise reduction. This can be set to `null` to turn off.
846 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.
847 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.
848
849 - `type: NoiseReductionType`
850
851 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.
852
853 - `transcription: AudioTranscription`
854
855 Configuration for input audio transcription, defaults to off and can be set to `null` to turn off once on. Input audio transcription is not native to the model, since the model consumes audio directly. Transcription runs asynchronously through [the /audio/transcriptions endpoint](https://platform.openai.com/docs/api-reference/audio/createTranscription) and should be treated as guidance of input audio content rather than precisely what the model heard. The client can optionally set the language and prompt for transcription, these offer additional guidance to the transcription service.
856
857 - `turn_detection: RealtimeTranscriptionSessionAudioInputTurnDetection`
858
859 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.
860
861 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.
862
863 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.
864
865 For `gpt-realtime-whisper` transcription sessions, turn detection must be
866 set to `null`; VAD is not supported.
867
868 - `class ServerVad`
869
870 Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.
871
872 - `type: :server_vad`
873
874 Type of turn detection, `server_vad` to turn on simple Server VAD.
875
876 - `:server_vad`
877
878 - `create_response: bool`
879
880 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.
881
882 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.
883
884 - `idle_timeout_ms: Integer`
885
886 Optional timeout after which a model response will be triggered automatically. This is
887 useful for situations in which a long pause from the user is unexpected, such as a phone
888 call. The model will effectively prompt the user to continue the conversation based
889 on the current context.
890
891 The timeout value will be applied after the last model response's audio has finished playing,
892 i.e. it's set to the `response.done` time plus audio playback duration.
893
894 An `input_audio_buffer.timeout_triggered` event (plus events
895 associated with the Response) will be emitted when the timeout is reached.
896 Idle timeout is currently only supported for `server_vad` mode.
897
898 - `interrupt_response: bool`
899
900 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default
901 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.
902
903 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.
904
905 - `prefix_padding_ms: Integer`
906
907 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in
908 milliseconds). Defaults to 300ms.
909
910 - `silence_duration_ms: Integer`
911
912 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults
913 to 500ms. With shorter values the model will respond more quickly,
914 but may jump in on short pauses from the user.
915
916 - `threshold: Float`
917
918 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A
919 higher threshold will require louder audio to activate the model, and
920 thus might perform better in noisy environments.
921
922 - `class SemanticVad`
923
924 Server-side semantic turn detection which uses a model to determine when the user has finished speaking.
925
926 - `type: :semantic_vad`
927
928 Type of turn detection, `semantic_vad` to turn on Semantic VAD.
929
930 - `:semantic_vad`
931
932 - `create_response: bool`
933
934 Whether or not to automatically generate a response when a VAD stop event occurs.
935
936 - `eagerness: :low | :medium | :high | :auto`
937
938 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.
939
940 - `:low`
941
942 - `:medium`
943
944 - `:high`
945
946 - `:auto`
947
948 - `interrupt_response: bool`
949
950 Whether or not to automatically interrupt any ongoing response with output to the default
951 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.
952
953 - `include: Array[:"item.input_audio_transcription.logprobs"]`
954
955 Additional fields to include in server outputs.
956
957 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.
958
959 - `:"item.input_audio_transcription.logprobs"`
960
961### Returns
962
963- `class ClientSecretCreateResponse`
964
965 Response from creating a session and client secret for the Realtime API.
966
967 - `expires_at: Integer`
968
969 Expiration timestamp for the client secret, in seconds since epoch.
970
971 - `session: RealtimeSessionCreateResponse | RealtimeTranscriptionSessionCreateResponse`
972
973 The session configuration for either a realtime or transcription session.
974
975 - `class RealtimeSessionCreateResponse`
976
977 A Realtime session configuration object.
978
979 - `id: String`
980
981 Unique identifier for the session that looks like `sess_1234567890abcdef`.
982
983 - `object: :"realtime.session"`
984
985 The object type. Always `realtime.session`.
986
987 - `:"realtime.session"`
988
989 - `type: :realtime`
990
991 The type of session to create. Always `realtime` for the Realtime API.
992
993 - `:realtime`
994
995 - `audio: Audio{ input, output}`
996
997 Configuration for input and output audio.
998
999 - `input: Input{ format_, noise_reduction, transcription, turn_detection}`
1000
1001 - `format_: RealtimeAudioFormats`
1002
1003 The format of the input audio.
1004
1005 - `class AudioPCM`
1006
1007 The PCM audio format. Only a 24kHz sample rate is supported.
1008
1009 - `rate: 24000`
1010
1011 The sample rate of the audio. Always `24000`.
1012
1013 - `24000`
1014
1015 - `type: :"audio/pcm"`
1016
1017 The audio format. Always `audio/pcm`.
1018
1019 - `:"audio/pcm"`
1020
1021 - `class AudioPCMU`
1022
1023 The G.711 μ-law format.
1024
1025 - `type: :"audio/pcmu"`
1026
1027 The audio format. Always `audio/pcmu`.
1028
1029 - `:"audio/pcmu"`
1030
1031 - `class AudioPCMA`
1032
1033 The G.711 A-law format.
1034
1035 - `type: :"audio/pcma"`
1036
1037 The audio format. Always `audio/pcma`.
1038
1039 - `:"audio/pcma"`
1040
1041 - `noise_reduction: NoiseReduction{ type}`
1042
1043 Configuration for input audio noise reduction. This can be set to `null` to turn off.
1044 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.
1045 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.
1046
1047 - `type: NoiseReductionType`
1048
1049 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.
1050
1051 - `:near_field`
1052
1053 - `:far_field`
1054
1055 - `transcription: AudioTranscription`
1056
1057 - `delay: :minimal | :low | :medium | 2 more`
1058
1059 Controls how long the model waits before emitting transcription text.
1060 Higher values can improve transcription accuracy at the cost of latency.
1061 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.
1062
1063 - `:minimal`
1064
1065 - `:low`
1066
1067 - `:medium`
1068
1069 - `:high`
1070
1071 - `:xhigh`
1072
1073 - `language: String`
1074
1075 The language of the input audio. Supplying the input language in
1076 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format
1077 will improve accuracy and latency.
1078
1079 - `model: String | :"whisper-1" | :"gpt-4o-mini-transcribe" | :"gpt-4o-mini-transcribe-2025-12-15" | 3 more`
1080
1081 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.
1082
1083 - `String = String`
1084
1085 - `Model = :"whisper-1" | :"gpt-4o-mini-transcribe" | :"gpt-4o-mini-transcribe-2025-12-15" | 3 more`
1086
1087 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.
1088
1089 - `:"whisper-1"`
1090
1091 - `:"gpt-4o-mini-transcribe"`
1092
1093 - `:"gpt-4o-mini-transcribe-2025-12-15"`
1094
1095 - `:"gpt-4o-transcribe"`
1096
1097 - `:"gpt-4o-transcribe-diarize"`
1098
1099 - `:"gpt-realtime-whisper"`
1100
1101 - `prompt: String`
1102
1103 An optional text to guide the model's style or continue a previous audio
1104 segment.
1105 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).
1106 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".
1107 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.
1108
1109 - `turn_detection: ServerVad{ type, create_response, idle_timeout_ms, 4 more} | SemanticVad{ type, create_response, eagerness, interrupt_response}`
1110
1111 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.
1112
1113 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.
1114
1115 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.
1116
1117 For `gpt-realtime-whisper` transcription sessions, turn detection must be
1118 set to `null`; VAD is not supported.
1119
1120 - `class ServerVad`
1121
1122 Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.
1123
1124 - `type: :server_vad`
1125
1126 Type of turn detection, `server_vad` to turn on simple Server VAD.
1127
1128 - `:server_vad`
1129
1130 - `create_response: bool`
1131
1132 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.
1133
1134 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.
1135
1136 - `idle_timeout_ms: Integer`
1137
1138 Optional timeout after which a model response will be triggered automatically. This is
1139 useful for situations in which a long pause from the user is unexpected, such as a phone
1140 call. The model will effectively prompt the user to continue the conversation based
1141 on the current context.
1142
1143 The timeout value will be applied after the last model response's audio has finished playing,
1144 i.e. it's set to the `response.done` time plus audio playback duration.
1145
1146 An `input_audio_buffer.timeout_triggered` event (plus events
1147 associated with the Response) will be emitted when the timeout is reached.
1148 Idle timeout is currently only supported for `server_vad` mode.
1149
1150 - `interrupt_response: bool`
1151
1152 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default
1153 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.
1154
1155 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.
1156
1157 - `prefix_padding_ms: Integer`
1158
1159 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in
1160 milliseconds). Defaults to 300ms.
1161
1162 - `silence_duration_ms: Integer`
1163
1164 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults
1165 to 500ms. With shorter values the model will respond more quickly,
1166 but may jump in on short pauses from the user.
1167
1168 - `threshold: Float`
1169
1170 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A
1171 higher threshold will require louder audio to activate the model, and
1172 thus might perform better in noisy environments.
1173
1174 - `class SemanticVad`
1175
1176 Server-side semantic turn detection which uses a model to determine when the user has finished speaking.
1177
1178 - `type: :semantic_vad`
1179
1180 Type of turn detection, `semantic_vad` to turn on Semantic VAD.
1181
1182 - `:semantic_vad`
1183
1184 - `create_response: bool`
1185
1186 Whether or not to automatically generate a response when a VAD stop event occurs.
1187
1188 - `eagerness: :low | :medium | :high | :auto`
1189
1190 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.
1191
1192 - `:low`
1193
1194 - `:medium`
1195
1196 - `:high`
1197
1198 - `:auto`
1199
1200 - `interrupt_response: bool`
1201
1202 Whether or not to automatically interrupt any ongoing response with output to the default
1203 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.
1204
1205 - `output: Output{ format_, speed, voice}`
1206
1207 - `format_: RealtimeAudioFormats`
1208
1209 The format of the output audio.
1210
1211 - `speed: Float`
1212
1213 The speed of the model's spoken response as a multiple of the original speed.
1214 1.0 is the default speed. 0.25 is the minimum speed. 1.5 is the maximum speed. This value can only be changed in between model turns, not while a response is in progress.
1215
1216 This parameter is a post-processing adjustment to the audio after it is generated, it's
1217 also possible to prompt the model to speak faster or slower.
1218
1219 - `voice: String | :alloy | :ash | :ballad | 7 more`
1220
1221 The voice the model uses to respond. Voice cannot be changed during the
1222 session once the model has responded with audio at least once. Current
1223 voice options are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`,
1224 `shimmer`, `verse`, `marin`, and `cedar`. We recommend `marin` and `cedar` for
1225 best quality.
1226
1227 - `String = String`
1228
1229 - `Voice = :alloy | :ash | :ballad | 7 more`
1230
1231 The voice the model uses to respond. Voice cannot be changed during the
1232 session once the model has responded with audio at least once. Current
1233 voice options are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`,
1234 `shimmer`, `verse`, `marin`, and `cedar`. We recommend `marin` and `cedar` for
1235 best quality.
1236
1237 - `:alloy`
1238
1239 - `:ash`
1240
1241 - `:ballad`
1242
1243 - `:coral`
1244
1245 - `:echo`
1246
1247 - `:sage`
1248
1249 - `:shimmer`
1250
1251 - `:verse`
1252
1253 - `:marin`
1254
1255 - `:cedar`
1256
1257 - `expires_at: Integer`
1258
1259 Expiration timestamp for the session, in seconds since epoch.
1260
1261 - `include: Array[:"item.input_audio_transcription.logprobs"]`
1262
1263 Additional fields to include in server outputs.
1264
1265 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.
1266
1267 - `:"item.input_audio_transcription.logprobs"`
1268
1269 - `instructions: String`
1270
1271 The default system instructions (i.e. system message) prepended to model calls. This field allows the client to guide the model on desired responses. The model can be instructed on response content and format, (e.g. "be extremely succinct", "act friendly", "here are examples of good responses") and on audio behavior (e.g. "talk quickly", "inject emotion into your voice", "laugh frequently"). The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behavior.
1272
1273 Note that the server sets default instructions which will be used if this field is not set and are visible in the `session.created` event at the start of the session.
1274
1275 - `max_output_tokens: Integer | :inf`
1276
1277 Maximum number of output tokens for a single assistant response,
1278 inclusive of tool calls. Provide an integer between 1 and 4096 to
1279 limit output tokens, or `inf` for the maximum available tokens for a
1280 given model. Defaults to `inf`.
1281
1282 - `Integer = Integer`
1283
1284 - `MaxOutputTokens = :inf`
1285
1286 - `:inf`
1287
1288 - `model: String | :"gpt-realtime" | :"gpt-realtime-1.5" | :"gpt-realtime-2" | 14 more`
1289
1290 The Realtime model used for this session.
1291
1292 - `String = String`
1293
1294 - `Model = :"gpt-realtime" | :"gpt-realtime-1.5" | :"gpt-realtime-2" | 14 more`
1295
1296 The Realtime model used for this session.
1297
1298 - `:"gpt-realtime"`
1299
1300 - `:"gpt-realtime-1.5"`
1301
1302 - `:"gpt-realtime-2"`
1303
1304 - `:"gpt-realtime-2025-08-28"`
1305
1306 - `:"gpt-4o-realtime-preview"`
1307
1308 - `:"gpt-4o-realtime-preview-2024-10-01"`
1309
1310 - `:"gpt-4o-realtime-preview-2024-12-17"`
1311
1312 - `:"gpt-4o-realtime-preview-2025-06-03"`
1313
1314 - `:"gpt-4o-mini-realtime-preview"`
1315
1316 - `:"gpt-4o-mini-realtime-preview-2024-12-17"`
1317
1318 - `:"gpt-realtime-mini"`
1319
1320 - `:"gpt-realtime-mini-2025-10-06"`
1321
1322 - `:"gpt-realtime-mini-2025-12-15"`
1323
1324 - `:"gpt-audio-1.5"`
1325
1326 - `:"gpt-audio-mini"`
1327
1328 - `:"gpt-audio-mini-2025-10-06"`
1329
1330 - `:"gpt-audio-mini-2025-12-15"`
1331
1332 - `output_modalities: Array[:text | :audio]`
1333
1334 The set of modalities the model can respond with. It defaults to `["audio"]`, indicating
1335 that the model will respond with audio plus a transcript. `["text"]` can be used to make
1336 the model respond with text only. It is not possible to request both `text` and `audio` at the same time.
1337
1338 - `:text`
1339
1340 - `:audio`
1341
1342 - `prompt: ResponsePrompt`
1343
1344 Reference to a prompt template and its variables.
1345 [Learn more](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts).
1346
1347 - `id: String`
1348
1349 The unique identifier of the prompt template to use.
1350
1351 - `variables: Hash[Symbol, String | ResponseInputText | ResponseInputImage | ResponseInputFile]`
1352
1353 Optional map of values to substitute in for variables in your
1354 prompt. The substitution values can either be strings, or other
1355 Response input types like images or files.
1356
1357 - `String = String`
1358
1359 - `class ResponseInputText`
1360
1361 A text input to the model.
1362
1363 - `text: String`
1364
1365 The text input to the model.
1366
1367 - `type: :input_text`
1368
1369 The type of the input item. Always `input_text`.
1370
1371 - `:input_text`
1372
1373 - `class ResponseInputImage`
1374
1375 An image input to the model. Learn about [image inputs](https://platform.openai.com/docs/guides/vision).
1376
1377 - `detail: :low | :high | :auto | :original`
1378
1379 The detail level of the image to be sent to the model. One of `high`, `low`, `auto`, or `original`. Defaults to `auto`.
1380
1381 - `:low`
1382
1383 - `:high`
1384
1385 - `:auto`
1386
1387 - `:original`
1388
1389 - `type: :input_image`
1390
1391 The type of the input item. Always `input_image`.
1392
1393 - `:input_image`
1394
1395 - `file_id: String`
1396
1397 The ID of the file to be sent to the model.
1398
1399 - `image_url: String`
1400
1401 The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.
1402
1403 - `class ResponseInputFile`
1404
1405 A file input to the model.
1406
1407 - `type: :input_file`
1408
1409 The type of the input item. Always `input_file`.
1410
1411 - `:input_file`
1412
1413 - `detail: :low | :high`
1414
1415 The detail level of the file to be sent to the model. Use `low` for the default rendering behavior, or `high` to render the file at higher quality. Defaults to `low`.
1416
1417 - `:low`
1418
1419 - `:high`
1420
1421 - `file_data: String`
1422
1423 The content of the file to be sent to the model.
1424
1425 - `file_id: String`
1426
1427 The ID of the file to be sent to the model.
1428
1429 - `file_url: String`
1430
1431 The URL of the file to be sent to the model.
1432
1433 - `filename: String`
1434
1435 The name of the file to be sent to the model.
1436
1437 - `version: String`
1438
1439 Optional version of the prompt template.
1440
1441 - `reasoning: RealtimeReasoning`
1442
1443 Configuration for reasoning-capable Realtime models such as `gpt-realtime-2`.
1444
1445 - `effort: RealtimeReasoningEffort`
1446
1447 Constrains effort on reasoning for reasoning-capable Realtime models such as
1448 `gpt-realtime-2`.
1449
1450 - `:minimal`
1451
1452 - `:low`
1453
1454 - `:medium`
1455
1456 - `:high`
1457
1458 - `:xhigh`
1459
1460 - `tool_choice: ToolChoiceOptions | ToolChoiceFunction | ToolChoiceMcp`
1461
1462 How the model chooses tools. Provide one of the string modes or force a specific
1463 function/MCP tool.
1464
1465 - `ToolChoiceOptions = :none | :auto | :required`
1466
1467 Controls which (if any) tool is called by the model.
1468
1469 `none` means the model will not call any tool and instead generates a message.
1470
1471 `auto` means the model can pick between generating a message or calling one or
1472 more tools.
1473
1474 `required` means the model must call one or more tools.
1475
1476 - `:none`
1477
1478 - `:auto`
1479
1480 - `:required`
1481
1482 - `class ToolChoiceFunction`
1483
1484 Use this option to force the model to call a specific function.
1485
1486 - `name: String`
1487
1488 The name of the function to call.
1489
1490 - `type: :function`
1491
1492 For function calling, the type is always `function`.
1493
1494 - `:function`
1495
1496 - `class ToolChoiceMcp`
1497
1498 Use this option to force the model to call a specific tool on a remote MCP server.
1499
1500 - `server_label: String`
1501
1502 The label of the MCP server to use.
1503
1504 - `type: :mcp`
1505
1506 For MCP tools, the type is always `mcp`.
1507
1508 - `:mcp`
1509
1510 - `name: String`
1511
1512 The name of the tool to call on the server.
1513
1514 - `tools: Array[RealtimeFunctionTool | McpTool{ server_label, type, allowed_tools, 7 more}]`
1515
1516 Tools available to the model.
1517
1518 - `class RealtimeFunctionTool`
1519
1520 - `description: String`
1521
1522 The description of the function, including guidance on when and how
1523 to call it, and guidance about what to tell the user when calling
1524 (if anything).
1525
1526 - `name: String`
1527
1528 The name of the function.
1529
1530 - `parameters: untyped`
1531
1532 Parameters of the function in JSON Schema.
1533
1534 - `type: :function`
1535
1536 The type of the tool, i.e. `function`.
1537
1538 - `:function`
1539
1540 - `class McpTool`
1541
1542 Give the model access to additional tools via remote Model Context Protocol
1543 (MCP) servers. [Learn more about MCP](https://platform.openai.com/docs/guides/tools-remote-mcp).
1544
1545 - `server_label: String`
1546
1547 A label for this MCP server, used to identify it in tool calls.
1548
1549 - `type: :mcp`
1550
1551 The type of the MCP tool. Always `mcp`.
1552
1553 - `:mcp`
1554
1555 - `allowed_tools: Array[String] | McpToolFilter{ read_only, tool_names}`
1556
1557 List of allowed tool names or a filter object.
1558
1559 - `McpAllowedTools = Array[String]`
1560
1561 A string array of allowed tool names
1562
1563 - `class McpToolFilter`
1564
1565 A filter object to specify which tools are allowed.
1566
1567 - `read_only: bool`
1568
1569 Indicates whether or not a tool modifies data or is read-only. If an
1570 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),
1571 it will match this filter.
1572
1573 - `tool_names: Array[String]`
1574
1575 List of allowed tool names.
1576
1577 - `authorization: String`
1578
1579 An OAuth access token that can be used with a remote MCP server, either
1580 with a custom MCP server URL or a service connector. Your application
1581 must handle the OAuth authorization flow and provide the token here.
1582
1583 - `connector_id: :connector_dropbox | :connector_gmail | :connector_googlecalendar | 5 more`
1584
1585 Identifier for service connectors, like those available in ChatGPT. One of
1586 `server_url` or `connector_id` must be provided. Learn more about service
1587 connectors [here](https://platform.openai.com/docs/guides/tools-remote-mcp#connectors).
1588
1589 Currently supported `connector_id` values are:
1590
1591 - Dropbox: `connector_dropbox`
1592 - Gmail: `connector_gmail`
1593 - Google Calendar: `connector_googlecalendar`
1594 - Google Drive: `connector_googledrive`
1595 - Microsoft Teams: `connector_microsoftteams`
1596 - Outlook Calendar: `connector_outlookcalendar`
1597 - Outlook Email: `connector_outlookemail`
1598 - SharePoint: `connector_sharepoint`
1599
1600 - `:connector_dropbox`
1601
1602 - `:connector_gmail`
1603
1604 - `:connector_googlecalendar`
1605
1606 - `:connector_googledrive`
1607
1608 - `:connector_microsoftteams`
1609
1610 - `:connector_outlookcalendar`
1611
1612 - `:connector_outlookemail`
1613
1614 - `:connector_sharepoint`
1615
1616 - `defer_loading: bool`
1617
1618 Whether this MCP tool is deferred and discovered via tool search.
1619
1620 - `headers: Hash[Symbol, String]`
1621
1622 Optional HTTP headers to send to the MCP server. Use for authentication
1623 or other purposes.
1624
1625 - `require_approval: McpToolApprovalFilter{ always, never} | :always | :never`
1626
1627 Specify which of the MCP server's tools require approval.
1628
1629 - `class McpToolApprovalFilter`
1630
1631 Specify which of the MCP server's tools require approval. Can be
1632 `always`, `never`, or a filter object associated with tools
1633 that require approval.
1634
1635 - `always: Always{ read_only, tool_names}`
1636
1637 A filter object to specify which tools are allowed.
1638
1639 - `read_only: bool`
1640
1641 Indicates whether or not a tool modifies data or is read-only. If an
1642 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),
1643 it will match this filter.
1644
1645 - `tool_names: Array[String]`
1646
1647 List of allowed tool names.
1648
1649 - `never: Never{ read_only, tool_names}`
1650
1651 A filter object to specify which tools are allowed.
1652
1653 - `read_only: bool`
1654
1655 Indicates whether or not a tool modifies data or is read-only. If an
1656 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),
1657 it will match this filter.
1658
1659 - `tool_names: Array[String]`
1660
1661 List of allowed tool names.
1662
1663 - `McpToolApprovalSetting = :always | :never`
1664
1665 Specify a single approval policy for all tools. One of `always` or
1666 `never`. When set to `always`, all tools will require approval. When
1667 set to `never`, all tools will not require approval.
1668
1669 - `:always`
1670
1671 - `:never`
1672
1673 - `server_description: String`
1674
1675 Optional description of the MCP server, used to provide more context.
1676
1677 - `server_url: String`
1678
1679 The URL for the MCP server. One of `server_url` or `connector_id` must be
1680 provided.
1681
1682 - `tracing: :auto | TracingConfiguration{ group_id, metadata, workflow_name}`
1683
1684 Realtime API can write session traces to the [Traces Dashboard](https://platform.openai.com/logs?api=traces). Set to null to disable tracing. Once
1685 tracing is enabled for a session, the configuration cannot be modified.
1686
1687 `auto` will create a trace for the session with default values for the
1688 workflow name, group id, and metadata.
1689
1690 - `Tracing = :auto`
1691
1692 Enables tracing and sets default values for tracing configuration options. Always `auto`.
1693
1694 - `:auto`
1695
1696 - `class TracingConfiguration`
1697
1698 Granular configuration for tracing.
1699
1700 - `group_id: String`
1701
1702 The group id to attach to this trace to enable filtering and
1703 grouping in the Traces Dashboard.
1704
1705 - `metadata: untyped`
1706
1707 The arbitrary metadata to attach to this trace to enable
1708 filtering in the Traces Dashboard.
1709
1710 - `workflow_name: String`
1711
1712 The name of the workflow to attach to this trace. This is used to
1713 name the trace in the Traces Dashboard.
1714
1715 - `truncation: RealtimeTruncation`
1716
1717 When the number of tokens in a conversation exceeds the model's input token limit, the conversation be truncated, meaning messages (starting from the oldest) will not be included in the model's context. A 32k context model with 4,096 max output tokens can only include 28,224 tokens in the context before truncation occurs.
1718
1719 Clients can configure truncation behavior to truncate with a lower max token limit, which is an effective way to control token usage and cost.
1720
1721 Truncation will reduce the number of cached tokens on the next turn (busting the cache), since messages are dropped from the beginning of the context. However, clients can also configure truncation to retain messages up to a fraction of the maximum context size, which will reduce the need for future truncations and thus improve the cache rate.
1722
1723 Truncation can be disabled entirely, which means the server will never truncate but would instead return an error if the conversation exceeds the model's input token limit.
1724
1725 - `RealtimeTruncationStrategy = :auto | :disabled`
1726
1727 The truncation strategy to use for the session. `auto` is the default truncation strategy. `disabled` will disable truncation and emit errors when the conversation exceeds the input token limit.
1728
1729 - `:auto`
1730
1731 - `:disabled`
1732
1733 - `class RealtimeTruncationRetentionRatio`
1734
1735 Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.
1736
1737 - `retention_ratio: Float`
1738
1739 Fraction of post-instruction conversation tokens to retain (`0.0` - `1.0`) when the conversation exceeds the input token limit. Setting this to `0.8` means that messages will be dropped until 80% of the maximum allowed tokens are used. This helps reduce the frequency of truncations and improve cache rates.
1740
1741 - `type: :retention_ratio`
1742
1743 Use retention ratio truncation.
1744
1745 - `:retention_ratio`
1746
1747 - `token_limits: TokenLimits{ post_instructions}`
1748
1749 Optional custom token limits for this truncation strategy. If not provided, the model's default token limits will be used.
1750
1751 - `post_instructions: Integer`
1752
1753 Maximum tokens allowed in the conversation after instructions (which including tool definitions). For example, setting this to 5,000 would mean that truncation would occur when the conversation exceeds 5,000 tokens after instructions. This cannot be higher than the model's context window size minus the maximum output tokens.
1754
1755 - `class RealtimeTranscriptionSessionCreateResponse`
1756
1757 A Realtime transcription session configuration object.
1758
1759 - `id: String`
1760
1761 Unique identifier for the session that looks like `sess_1234567890abcdef`.
1762
1763 - `object: String`
1764
1765 The object type. Always `realtime.transcription_session`.
1766
1767 - `type: :transcription`
1768
1769 The type of session. Always `transcription` for transcription sessions.
1770
1771 - `:transcription`
1772
1773 - `audio: Audio{ input}`
1774
1775 Configuration for input audio for the session.
1776
1777 - `input: Input{ format_, noise_reduction, transcription, turn_detection}`
1778
1779 - `format_: RealtimeAudioFormats`
1780
1781 The PCM audio format. Only a 24kHz sample rate is supported.
1782
1783 - `noise_reduction: NoiseReduction{ type}`
1784
1785 Configuration for input audio noise reduction.
1786
1787 - `type: NoiseReductionType`
1788
1789 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.
1790
1791 - `transcription: AudioTranscription`
1792
1793 - `turn_detection: RealtimeTranscriptionSessionTurnDetection`
1794
1795 Configuration for turn detection. Can be set to `null` to turn off. Server
1796 VAD means that the model will detect the start and end of speech based on
1797 audio volume and respond at the end of user speech. For `gpt-realtime-whisper`, this must be `null`; VAD is not supported.
1798
1799 - `prefix_padding_ms: Integer`
1800
1801 Amount of audio to include before the VAD detected speech (in
1802 milliseconds). Defaults to 300ms.
1803
1804 - `silence_duration_ms: Integer`
1805
1806 Duration of silence to detect speech stop (in milliseconds). Defaults
1807 to 500ms. With shorter values the model will respond more quickly,
1808 but may jump in on short pauses from the user.
1809
1810 - `threshold: Float`
1811
1812 Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A
1813 higher threshold will require louder audio to activate the model, and
1814 thus might perform better in noisy environments.
1815
1816 - `type: String`
1817
1818 Type of turn detection, only `server_vad` is currently supported.
1819
1820 - `expires_at: Integer`
1821
1822 Expiration timestamp for the session, in seconds since epoch.
1823
1824 - `include: Array[:"item.input_audio_transcription.logprobs"]`
1825
1826 Additional fields to include in server outputs.
1827
1828 - `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.
1829
1830 - `:"item.input_audio_transcription.logprobs"`
1831
1832 - `value: String`
1833
1834 The generated client secret value.
1835
1836### Example
1837
1838```ruby
1839require "openai"
1840
1841openai = OpenAI::Client.new(api_key: "My API Key")
1842
1843client_secret = openai.realtime.client_secrets.create
1844
1845puts(client_secret)
1846```
1847
1848#### Response
1849
1850```json
1851{
1852 "expires_at": 0,
1853 "session": {
1854 "id": "id",
1855 "object": "realtime.session",
1856 "type": "realtime",
1857 "audio": {
1858 "input": {
1859 "format": {
1860 "rate": 24000,
1861 "type": "audio/pcm"
1862 },
1863 "noise_reduction": {
1864 "type": "near_field"
1865 },
1866 "transcription": {
1867 "delay": "minimal",
1868 "language": "language",
1869 "model": "string",
1870 "prompt": "prompt"
1871 },
1872 "turn_detection": {
1873 "type": "server_vad",
1874 "create_response": true,
1875 "idle_timeout_ms": 5000,
1876 "interrupt_response": true,
1877 "prefix_padding_ms": 0,
1878 "silence_duration_ms": 0,
1879 "threshold": 0
1880 }
1881 },
1882 "output": {
1883 "format": {
1884 "rate": 24000,
1885 "type": "audio/pcm"
1886 },
1887 "speed": 0.25,
1888 "voice": "ash"
1889 }
1890 },
1891 "expires_at": 0,
1892 "include": [
1893 "item.input_audio_transcription.logprobs"
1894 ],
1895 "instructions": "instructions",
1896 "max_output_tokens": 0,
1897 "model": "string",
1898 "output_modalities": [
1899 "text"
1900 ],
1901 "prompt": {
1902 "id": "id",
1903 "variables": {
1904 "foo": "string"
1905 },
1906 "version": "version"
1907 },
1908 "reasoning": {
1909 "effort": "minimal"
1910 },
1911 "tool_choice": "none",
1912 "tools": [
1913 {
1914 "description": "description",
1915 "name": "name",
1916 "parameters": {},
1917 "type": "function"
1918 }
1919 ],
1920 "tracing": "auto",
1921 "truncation": "auto"
1922 },
1923 "value": "value"
1924}
1925```
1926
1927## Domain Types
1928
1929### Realtime Session Create Response
1930
1931- `class RealtimeSessionCreateResponse`
1932
1933 A Realtime session configuration object.
1934
1935 - `id: String`
1936
1937 Unique identifier for the session that looks like `sess_1234567890abcdef`.
1938
1939 - `object: :"realtime.session"`
1940
1941 The object type. Always `realtime.session`.
1942
1943 - `:"realtime.session"`
1944
1945 - `type: :realtime`
1946
1947 The type of session to create. Always `realtime` for the Realtime API.
1948
1949 - `:realtime`
1950
1951 - `audio: Audio{ input, output}`
1952
1953 Configuration for input and output audio.
1954
1955 - `input: Input{ format_, noise_reduction, transcription, turn_detection}`
1956
1957 - `format_: RealtimeAudioFormats`
1958
1959 The format of the input audio.
1960
1961 - `class AudioPCM`
1962
1963 The PCM audio format. Only a 24kHz sample rate is supported.
1964
1965 - `rate: 24000`
1966
1967 The sample rate of the audio. Always `24000`.
1968
1969 - `24000`
1970
1971 - `type: :"audio/pcm"`
1972
1973 The audio format. Always `audio/pcm`.
1974
1975 - `:"audio/pcm"`
1976
1977 - `class AudioPCMU`
1978
1979 The G.711 μ-law format.
1980
1981 - `type: :"audio/pcmu"`
1982
1983 The audio format. Always `audio/pcmu`.
1984
1985 - `:"audio/pcmu"`
1986
1987 - `class AudioPCMA`
1988
1989 The G.711 A-law format.
1990
1991 - `type: :"audio/pcma"`
1992
1993 The audio format. Always `audio/pcma`.
1994
1995 - `:"audio/pcma"`
1996
1997 - `noise_reduction: NoiseReduction{ type}`
1998
1999 Configuration for input audio noise reduction. This can be set to `null` to turn off.
2000 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.
2001 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.
2002
2003 - `type: NoiseReductionType`
2004
2005 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.
2006
2007 - `:near_field`
2008
2009 - `:far_field`
2010
2011 - `transcription: AudioTranscription`
2012
2013 - `delay: :minimal | :low | :medium | 2 more`
2014
2015 Controls how long the model waits before emitting transcription text.
2016 Higher values can improve transcription accuracy at the cost of latency.
2017 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.
2018
2019 - `:minimal`
2020
2021 - `:low`
2022
2023 - `:medium`
2024
2025 - `:high`
2026
2027 - `:xhigh`
2028
2029 - `language: String`
2030
2031 The language of the input audio. Supplying the input language in
2032 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format
2033 will improve accuracy and latency.
2034
2035 - `model: String | :"whisper-1" | :"gpt-4o-mini-transcribe" | :"gpt-4o-mini-transcribe-2025-12-15" | 3 more`
2036
2037 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.
2038
2039 - `String = String`
2040
2041 - `Model = :"whisper-1" | :"gpt-4o-mini-transcribe" | :"gpt-4o-mini-transcribe-2025-12-15" | 3 more`
2042
2043 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.
2044
2045 - `:"whisper-1"`
2046
2047 - `:"gpt-4o-mini-transcribe"`
2048
2049 - `:"gpt-4o-mini-transcribe-2025-12-15"`
2050
2051 - `:"gpt-4o-transcribe"`
2052
2053 - `:"gpt-4o-transcribe-diarize"`
2054
2055 - `:"gpt-realtime-whisper"`
2056
2057 - `prompt: String`
2058
2059 An optional text to guide the model's style or continue a previous audio
2060 segment.
2061 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).
2062 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".
2063 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.
2064
2065 - `turn_detection: ServerVad{ type, create_response, idle_timeout_ms, 4 more} | SemanticVad{ type, create_response, eagerness, interrupt_response}`
2066
2067 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.
2068
2069 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.
2070
2071 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.
2072
2073 For `gpt-realtime-whisper` transcription sessions, turn detection must be
2074 set to `null`; VAD is not supported.
2075
2076 - `class ServerVad`
2077
2078 Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.
2079
2080 - `type: :server_vad`
2081
2082 Type of turn detection, `server_vad` to turn on simple Server VAD.
2083
2084 - `:server_vad`
2085
2086 - `create_response: bool`
2087
2088 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.
2089
2090 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.
2091
2092 - `idle_timeout_ms: Integer`
2093
2094 Optional timeout after which a model response will be triggered automatically. This is
2095 useful for situations in which a long pause from the user is unexpected, such as a phone
2096 call. The model will effectively prompt the user to continue the conversation based
2097 on the current context.
2098
2099 The timeout value will be applied after the last model response's audio has finished playing,
2100 i.e. it's set to the `response.done` time plus audio playback duration.
2101
2102 An `input_audio_buffer.timeout_triggered` event (plus events
2103 associated with the Response) will be emitted when the timeout is reached.
2104 Idle timeout is currently only supported for `server_vad` mode.
2105
2106 - `interrupt_response: bool`
2107
2108 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default
2109 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.
2110
2111 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.
2112
2113 - `prefix_padding_ms: Integer`
2114
2115 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in
2116 milliseconds). Defaults to 300ms.
2117
2118 - `silence_duration_ms: Integer`
2119
2120 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults
2121 to 500ms. With shorter values the model will respond more quickly,
2122 but may jump in on short pauses from the user.
2123
2124 - `threshold: Float`
2125
2126 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A
2127 higher threshold will require louder audio to activate the model, and
2128 thus might perform better in noisy environments.
2129
2130 - `class SemanticVad`
2131
2132 Server-side semantic turn detection which uses a model to determine when the user has finished speaking.
2133
2134 - `type: :semantic_vad`
2135
2136 Type of turn detection, `semantic_vad` to turn on Semantic VAD.
2137
2138 - `:semantic_vad`
2139
2140 - `create_response: bool`
2141
2142 Whether or not to automatically generate a response when a VAD stop event occurs.
2143
2144 - `eagerness: :low | :medium | :high | :auto`
2145
2146 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.
2147
2148 - `:low`
2149
2150 - `:medium`
2151
2152 - `:high`
2153
2154 - `:auto`
2155
2156 - `interrupt_response: bool`
2157
2158 Whether or not to automatically interrupt any ongoing response with output to the default
2159 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.
2160
2161 - `output: Output{ format_, speed, voice}`
2162
2163 - `format_: RealtimeAudioFormats`
2164
2165 The format of the output audio.
2166
2167 - `speed: Float`
2168
2169 The speed of the model's spoken response as a multiple of the original speed.
2170 1.0 is the default speed. 0.25 is the minimum speed. 1.5 is the maximum speed. This value can only be changed in between model turns, not while a response is in progress.
2171
2172 This parameter is a post-processing adjustment to the audio after it is generated, it's
2173 also possible to prompt the model to speak faster or slower.
2174
2175 - `voice: String | :alloy | :ash | :ballad | 7 more`
2176
2177 The voice the model uses to respond. Voice cannot be changed during the
2178 session once the model has responded with audio at least once. Current
2179 voice options are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`,
2180 `shimmer`, `verse`, `marin`, and `cedar`. We recommend `marin` and `cedar` for
2181 best quality.
2182
2183 - `String = String`
2184
2185 - `Voice = :alloy | :ash | :ballad | 7 more`
2186
2187 The voice the model uses to respond. Voice cannot be changed during the
2188 session once the model has responded with audio at least once. Current
2189 voice options are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`,
2190 `shimmer`, `verse`, `marin`, and `cedar`. We recommend `marin` and `cedar` for
2191 best quality.
2192
2193 - `:alloy`
2194
2195 - `:ash`
2196
2197 - `:ballad`
2198
2199 - `:coral`
2200
2201 - `:echo`
2202
2203 - `:sage`
2204
2205 - `:shimmer`
2206
2207 - `:verse`
2208
2209 - `:marin`
2210
2211 - `:cedar`
2212
2213 - `expires_at: Integer`
2214
2215 Expiration timestamp for the session, in seconds since epoch.
2216
2217 - `include: Array[:"item.input_audio_transcription.logprobs"]`
2218
2219 Additional fields to include in server outputs.
2220
2221 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.
2222
2223 - `:"item.input_audio_transcription.logprobs"`
2224
2225 - `instructions: String`
2226
2227 The default system instructions (i.e. system message) prepended to model calls. This field allows the client to guide the model on desired responses. The model can be instructed on response content and format, (e.g. "be extremely succinct", "act friendly", "here are examples of good responses") and on audio behavior (e.g. "talk quickly", "inject emotion into your voice", "laugh frequently"). The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behavior.
2228
2229 Note that the server sets default instructions which will be used if this field is not set and are visible in the `session.created` event at the start of the session.
2230
2231 - `max_output_tokens: Integer | :inf`
2232
2233 Maximum number of output tokens for a single assistant response,
2234 inclusive of tool calls. Provide an integer between 1 and 4096 to
2235 limit output tokens, or `inf` for the maximum available tokens for a
2236 given model. Defaults to `inf`.
2237
2238 - `Integer = Integer`
2239
2240 - `MaxOutputTokens = :inf`
2241
2242 - `:inf`
2243
2244 - `model: String | :"gpt-realtime" | :"gpt-realtime-1.5" | :"gpt-realtime-2" | 14 more`
2245
2246 The Realtime model used for this session.
2247
2248 - `String = String`
2249
2250 - `Model = :"gpt-realtime" | :"gpt-realtime-1.5" | :"gpt-realtime-2" | 14 more`
2251
2252 The Realtime model used for this session.
2253
2254 - `:"gpt-realtime"`
2255
2256 - `:"gpt-realtime-1.5"`
2257
2258 - `:"gpt-realtime-2"`
2259
2260 - `:"gpt-realtime-2025-08-28"`
2261
2262 - `:"gpt-4o-realtime-preview"`
2263
2264 - `:"gpt-4o-realtime-preview-2024-10-01"`
2265
2266 - `:"gpt-4o-realtime-preview-2024-12-17"`
2267
2268 - `:"gpt-4o-realtime-preview-2025-06-03"`
2269
2270 - `:"gpt-4o-mini-realtime-preview"`
2271
2272 - `:"gpt-4o-mini-realtime-preview-2024-12-17"`
2273
2274 - `:"gpt-realtime-mini"`
2275
2276 - `:"gpt-realtime-mini-2025-10-06"`
2277
2278 - `:"gpt-realtime-mini-2025-12-15"`
2279
2280 - `:"gpt-audio-1.5"`
2281
2282 - `:"gpt-audio-mini"`
2283
2284 - `:"gpt-audio-mini-2025-10-06"`
2285
2286 - `:"gpt-audio-mini-2025-12-15"`
2287
2288 - `output_modalities: Array[:text | :audio]`
2289
2290 The set of modalities the model can respond with. It defaults to `["audio"]`, indicating
2291 that the model will respond with audio plus a transcript. `["text"]` can be used to make
2292 the model respond with text only. It is not possible to request both `text` and `audio` at the same time.
2293
2294 - `:text`
2295
2296 - `:audio`
2297
2298 - `prompt: ResponsePrompt`
2299
2300 Reference to a prompt template and its variables.
2301 [Learn more](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts).
2302
2303 - `id: String`
2304
2305 The unique identifier of the prompt template to use.
2306
2307 - `variables: Hash[Symbol, String | ResponseInputText | ResponseInputImage | ResponseInputFile]`
2308
2309 Optional map of values to substitute in for variables in your
2310 prompt. The substitution values can either be strings, or other
2311 Response input types like images or files.
2312
2313 - `String = String`
2314
2315 - `class ResponseInputText`
2316
2317 A text input to the model.
2318
2319 - `text: String`
2320
2321 The text input to the model.
2322
2323 - `type: :input_text`
2324
2325 The type of the input item. Always `input_text`.
2326
2327 - `:input_text`
2328
2329 - `class ResponseInputImage`
2330
2331 An image input to the model. Learn about [image inputs](https://platform.openai.com/docs/guides/vision).
2332
2333 - `detail: :low | :high | :auto | :original`
2334
2335 The detail level of the image to be sent to the model. One of `high`, `low`, `auto`, or `original`. Defaults to `auto`.
2336
2337 - `:low`
2338
2339 - `:high`
2340
2341 - `:auto`
2342
2343 - `:original`
2344
2345 - `type: :input_image`
2346
2347 The type of the input item. Always `input_image`.
2348
2349 - `:input_image`
2350
2351 - `file_id: String`
2352
2353 The ID of the file to be sent to the model.
2354
2355 - `image_url: String`
2356
2357 The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.
2358
2359 - `class ResponseInputFile`
2360
2361 A file input to the model.
2362
2363 - `type: :input_file`
2364
2365 The type of the input item. Always `input_file`.
2366
2367 - `:input_file`
2368
2369 - `detail: :low | :high`
2370
2371 The detail level of the file to be sent to the model. Use `low` for the default rendering behavior, or `high` to render the file at higher quality. Defaults to `low`.
2372
2373 - `:low`
2374
2375 - `:high`
2376
2377 - `file_data: String`
2378
2379 The content of the file to be sent to the model.
2380
2381 - `file_id: String`
2382
2383 The ID of the file to be sent to the model.
2384
2385 - `file_url: String`
2386
2387 The URL of the file to be sent to the model.
2388
2389 - `filename: String`
2390
2391 The name of the file to be sent to the model.
2392
2393 - `version: String`
2394
2395 Optional version of the prompt template.
2396
2397 - `reasoning: RealtimeReasoning`
2398
2399 Configuration for reasoning-capable Realtime models such as `gpt-realtime-2`.
2400
2401 - `effort: RealtimeReasoningEffort`
2402
2403 Constrains effort on reasoning for reasoning-capable Realtime models such as
2404 `gpt-realtime-2`.
2405
2406 - `:minimal`
2407
2408 - `:low`
2409
2410 - `:medium`
2411
2412 - `:high`
2413
2414 - `:xhigh`
2415
2416 - `tool_choice: ToolChoiceOptions | ToolChoiceFunction | ToolChoiceMcp`
2417
2418 How the model chooses tools. Provide one of the string modes or force a specific
2419 function/MCP tool.
2420
2421 - `ToolChoiceOptions = :none | :auto | :required`
2422
2423 Controls which (if any) tool is called by the model.
2424
2425 `none` means the model will not call any tool and instead generates a message.
2426
2427 `auto` means the model can pick between generating a message or calling one or
2428 more tools.
2429
2430 `required` means the model must call one or more tools.
2431
2432 - `:none`
2433
2434 - `:auto`
2435
2436 - `:required`
2437
2438 - `class ToolChoiceFunction`
2439
2440 Use this option to force the model to call a specific function.
2441
2442 - `name: String`
2443
2444 The name of the function to call.
2445
2446 - `type: :function`
2447
2448 For function calling, the type is always `function`.
2449
2450 - `:function`
2451
2452 - `class ToolChoiceMcp`
2453
2454 Use this option to force the model to call a specific tool on a remote MCP server.
2455
2456 - `server_label: String`
2457
2458 The label of the MCP server to use.
2459
2460 - `type: :mcp`
2461
2462 For MCP tools, the type is always `mcp`.
2463
2464 - `:mcp`
2465
2466 - `name: String`
2467
2468 The name of the tool to call on the server.
2469
2470 - `tools: Array[RealtimeFunctionTool | McpTool{ server_label, type, allowed_tools, 7 more}]`
2471
2472 Tools available to the model.
2473
2474 - `class RealtimeFunctionTool`
2475
2476 - `description: String`
2477
2478 The description of the function, including guidance on when and how
2479 to call it, and guidance about what to tell the user when calling
2480 (if anything).
2481
2482 - `name: String`
2483
2484 The name of the function.
2485
2486 - `parameters: untyped`
2487
2488 Parameters of the function in JSON Schema.
2489
2490 - `type: :function`
2491
2492 The type of the tool, i.e. `function`.
2493
2494 - `:function`
2495
2496 - `class McpTool`
2497
2498 Give the model access to additional tools via remote Model Context Protocol
2499 (MCP) servers. [Learn more about MCP](https://platform.openai.com/docs/guides/tools-remote-mcp).
2500
2501 - `server_label: String`
2502
2503 A label for this MCP server, used to identify it in tool calls.
2504
2505 - `type: :mcp`
2506
2507 The type of the MCP tool. Always `mcp`.
2508
2509 - `:mcp`
2510
2511 - `allowed_tools: Array[String] | McpToolFilter{ read_only, tool_names}`
2512
2513 List of allowed tool names or a filter object.
2514
2515 - `McpAllowedTools = Array[String]`
2516
2517 A string array of allowed tool names
2518
2519 - `class McpToolFilter`
2520
2521 A filter object to specify which tools are allowed.
2522
2523 - `read_only: bool`
2524
2525 Indicates whether or not a tool modifies data or is read-only. If an
2526 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),
2527 it will match this filter.
2528
2529 - `tool_names: Array[String]`
2530
2531 List of allowed tool names.
2532
2533 - `authorization: String`
2534
2535 An OAuth access token that can be used with a remote MCP server, either
2536 with a custom MCP server URL or a service connector. Your application
2537 must handle the OAuth authorization flow and provide the token here.
2538
2539 - `connector_id: :connector_dropbox | :connector_gmail | :connector_googlecalendar | 5 more`
2540
2541 Identifier for service connectors, like those available in ChatGPT. One of
2542 `server_url` or `connector_id` must be provided. Learn more about service
2543 connectors [here](https://platform.openai.com/docs/guides/tools-remote-mcp#connectors).
2544
2545 Currently supported `connector_id` values are:
2546
2547 - Dropbox: `connector_dropbox`
2548 - Gmail: `connector_gmail`
2549 - Google Calendar: `connector_googlecalendar`
2550 - Google Drive: `connector_googledrive`
2551 - Microsoft Teams: `connector_microsoftteams`
2552 - Outlook Calendar: `connector_outlookcalendar`
2553 - Outlook Email: `connector_outlookemail`
2554 - SharePoint: `connector_sharepoint`
2555
2556 - `:connector_dropbox`
2557
2558 - `:connector_gmail`
2559
2560 - `:connector_googlecalendar`
2561
2562 - `:connector_googledrive`
2563
2564 - `:connector_microsoftteams`
2565
2566 - `:connector_outlookcalendar`
2567
2568 - `:connector_outlookemail`
2569
2570 - `:connector_sharepoint`
2571
2572 - `defer_loading: bool`
2573
2574 Whether this MCP tool is deferred and discovered via tool search.
2575
2576 - `headers: Hash[Symbol, String]`
2577
2578 Optional HTTP headers to send to the MCP server. Use for authentication
2579 or other purposes.
2580
2581 - `require_approval: McpToolApprovalFilter{ always, never} | :always | :never`
2582
2583 Specify which of the MCP server's tools require approval.
2584
2585 - `class McpToolApprovalFilter`
2586
2587 Specify which of the MCP server's tools require approval. Can be
2588 `always`, `never`, or a filter object associated with tools
2589 that require approval.
2590
2591 - `always: Always{ read_only, tool_names}`
2592
2593 A filter object to specify which tools are allowed.
2594
2595 - `read_only: bool`
2596
2597 Indicates whether or not a tool modifies data or is read-only. If an
2598 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),
2599 it will match this filter.
2600
2601 - `tool_names: Array[String]`
2602
2603 List of allowed tool names.
2604
2605 - `never: Never{ read_only, tool_names}`
2606
2607 A filter object to specify which tools are allowed.
2608
2609 - `read_only: bool`
2610
2611 Indicates whether or not a tool modifies data or is read-only. If an
2612 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),
2613 it will match this filter.
2614
2615 - `tool_names: Array[String]`
2616
2617 List of allowed tool names.
2618
2619 - `McpToolApprovalSetting = :always | :never`
2620
2621 Specify a single approval policy for all tools. One of `always` or
2622 `never`. When set to `always`, all tools will require approval. When
2623 set to `never`, all tools will not require approval.
2624
2625 - `:always`
2626
2627 - `:never`
2628
2629 - `server_description: String`
2630
2631 Optional description of the MCP server, used to provide more context.
2632
2633 - `server_url: String`
2634
2635 The URL for the MCP server. One of `server_url` or `connector_id` must be
2636 provided.
2637
2638 - `tracing: :auto | TracingConfiguration{ group_id, metadata, workflow_name}`
2639
2640 Realtime API can write session traces to the [Traces Dashboard](https://platform.openai.com/logs?api=traces). Set to null to disable tracing. Once
2641 tracing is enabled for a session, the configuration cannot be modified.
2642
2643 `auto` will create a trace for the session with default values for the
2644 workflow name, group id, and metadata.
2645
2646 - `Tracing = :auto`
2647
2648 Enables tracing and sets default values for tracing configuration options. Always `auto`.
2649
2650 - `:auto`
2651
2652 - `class TracingConfiguration`
2653
2654 Granular configuration for tracing.
2655
2656 - `group_id: String`
2657
2658 The group id to attach to this trace to enable filtering and
2659 grouping in the Traces Dashboard.
2660
2661 - `metadata: untyped`
2662
2663 The arbitrary metadata to attach to this trace to enable
2664 filtering in the Traces Dashboard.
2665
2666 - `workflow_name: String`
2667
2668 The name of the workflow to attach to this trace. This is used to
2669 name the trace in the Traces Dashboard.
2670
2671 - `truncation: RealtimeTruncation`
2672
2673 When the number of tokens in a conversation exceeds the model's input token limit, the conversation be truncated, meaning messages (starting from the oldest) will not be included in the model's context. A 32k context model with 4,096 max output tokens can only include 28,224 tokens in the context before truncation occurs.
2674
2675 Clients can configure truncation behavior to truncate with a lower max token limit, which is an effective way to control token usage and cost.
2676
2677 Truncation will reduce the number of cached tokens on the next turn (busting the cache), since messages are dropped from the beginning of the context. However, clients can also configure truncation to retain messages up to a fraction of the maximum context size, which will reduce the need for future truncations and thus improve the cache rate.
2678
2679 Truncation can be disabled entirely, which means the server will never truncate but would instead return an error if the conversation exceeds the model's input token limit.
2680
2681 - `RealtimeTruncationStrategy = :auto | :disabled`
2682
2683 The truncation strategy to use for the session. `auto` is the default truncation strategy. `disabled` will disable truncation and emit errors when the conversation exceeds the input token limit.
2684
2685 - `:auto`
2686
2687 - `:disabled`
2688
2689 - `class RealtimeTruncationRetentionRatio`
2690
2691 Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.
2692
2693 - `retention_ratio: Float`
2694
2695 Fraction of post-instruction conversation tokens to retain (`0.0` - `1.0`) when the conversation exceeds the input token limit. Setting this to `0.8` means that messages will be dropped until 80% of the maximum allowed tokens are used. This helps reduce the frequency of truncations and improve cache rates.
2696
2697 - `type: :retention_ratio`
2698
2699 Use retention ratio truncation.
2700
2701 - `:retention_ratio`
2702
2703 - `token_limits: TokenLimits{ post_instructions}`
2704
2705 Optional custom token limits for this truncation strategy. If not provided, the model's default token limits will be used.
2706
2707 - `post_instructions: Integer`
2708
2709 Maximum tokens allowed in the conversation after instructions (which including tool definitions). For example, setting this to 5,000 would mean that truncation would occur when the conversation exceeds 5,000 tokens after instructions. This cannot be higher than the model's context window size minus the maximum output tokens.
2710
2711### Realtime Transcription Session Create Response
2712
2713- `class RealtimeTranscriptionSessionCreateResponse`
2714
2715 A Realtime transcription session configuration object.
2716
2717 - `id: String`
2718
2719 Unique identifier for the session that looks like `sess_1234567890abcdef`.
2720
2721 - `object: String`
2722
2723 The object type. Always `realtime.transcription_session`.
2724
2725 - `type: :transcription`
2726
2727 The type of session. Always `transcription` for transcription sessions.
2728
2729 - `:transcription`
2730
2731 - `audio: Audio{ input}`
2732
2733 Configuration for input audio for the session.
2734
2735 - `input: Input{ format_, noise_reduction, transcription, turn_detection}`
2736
2737 - `format_: RealtimeAudioFormats`
2738
2739 The PCM audio format. Only a 24kHz sample rate is supported.
2740
2741 - `class AudioPCM`
2742
2743 The PCM audio format. Only a 24kHz sample rate is supported.
2744
2745 - `rate: 24000`
2746
2747 The sample rate of the audio. Always `24000`.
2748
2749 - `24000`
2750
2751 - `type: :"audio/pcm"`
2752
2753 The audio format. Always `audio/pcm`.
2754
2755 - `:"audio/pcm"`
2756
2757 - `class AudioPCMU`
2758
2759 The G.711 μ-law format.
2760
2761 - `type: :"audio/pcmu"`
2762
2763 The audio format. Always `audio/pcmu`.
2764
2765 - `:"audio/pcmu"`
2766
2767 - `class AudioPCMA`
2768
2769 The G.711 A-law format.
2770
2771 - `type: :"audio/pcma"`
2772
2773 The audio format. Always `audio/pcma`.
2774
2775 - `:"audio/pcma"`
2776
2777 - `noise_reduction: NoiseReduction{ type}`
2778
2779 Configuration for input audio noise reduction.
2780
2781 - `type: NoiseReductionType`
2782
2783 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.
2784
2785 - `:near_field`
2786
2787 - `:far_field`
2788
2789 - `transcription: AudioTranscription`
2790
2791 - `delay: :minimal | :low | :medium | 2 more`
2792
2793 Controls how long the model waits before emitting transcription text.
2794 Higher values can improve transcription accuracy at the cost of latency.
2795 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.
2796
2797 - `:minimal`
2798
2799 - `:low`
2800
2801 - `:medium`
2802
2803 - `:high`
2804
2805 - `:xhigh`
2806
2807 - `language: String`
2808
2809 The language of the input audio. Supplying the input language in
2810 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format
2811 will improve accuracy and latency.
2812
2813 - `model: String | :"whisper-1" | :"gpt-4o-mini-transcribe" | :"gpt-4o-mini-transcribe-2025-12-15" | 3 more`
2814
2815 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.
2816
2817 - `String = String`
2818
2819 - `Model = :"whisper-1" | :"gpt-4o-mini-transcribe" | :"gpt-4o-mini-transcribe-2025-12-15" | 3 more`
2820
2821 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.
2822
2823 - `:"whisper-1"`
2824
2825 - `:"gpt-4o-mini-transcribe"`
2826
2827 - `:"gpt-4o-mini-transcribe-2025-12-15"`
2828
2829 - `:"gpt-4o-transcribe"`
2830
2831 - `:"gpt-4o-transcribe-diarize"`
2832
2833 - `:"gpt-realtime-whisper"`
2834
2835 - `prompt: String`
2836
2837 An optional text to guide the model's style or continue a previous audio
2838 segment.
2839 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).
2840 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".
2841 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.
2842
2843 - `turn_detection: RealtimeTranscriptionSessionTurnDetection`
2844
2845 Configuration for turn detection. Can be set to `null` to turn off. Server
2846 VAD means that the model will detect the start and end of speech based on
2847 audio volume and respond at the end of user speech. For `gpt-realtime-whisper`, this must be `null`; VAD is not supported.
2848
2849 - `prefix_padding_ms: Integer`
2850
2851 Amount of audio to include before the VAD detected speech (in
2852 milliseconds). Defaults to 300ms.
2853
2854 - `silence_duration_ms: Integer`
2855
2856 Duration of silence to detect speech stop (in milliseconds). Defaults
2857 to 500ms. With shorter values the model will respond more quickly,
2858 but may jump in on short pauses from the user.
2859
2860 - `threshold: Float`
2861
2862 Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A
2863 higher threshold will require louder audio to activate the model, and
2864 thus might perform better in noisy environments.
2865
2866 - `type: String`
2867
2868 Type of turn detection, only `server_vad` is currently supported.
2869
2870 - `expires_at: Integer`
2871
2872 Expiration timestamp for the session, in seconds since epoch.
2873
2874 - `include: Array[:"item.input_audio_transcription.logprobs"]`
2875
2876 Additional fields to include in server outputs.
2877
2878 - `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.
2879
2880 - `:"item.input_audio_transcription.logprobs"`
2881
2882### Realtime Transcription Session Turn Detection
2883
2884- `class RealtimeTranscriptionSessionTurnDetection`
2885
2886 Configuration for turn detection. Can be set to `null` to turn off. Server
2887 VAD means that the model will detect the start and end of speech based on
2888 audio volume and respond at the end of user speech. For `gpt-realtime-whisper`, this must be `null`; VAD is not supported.
2889
2890 - `prefix_padding_ms: Integer`
2891
2892 Amount of audio to include before the VAD detected speech (in
2893 milliseconds). Defaults to 300ms.
2894
2895 - `silence_duration_ms: Integer`
2896
2897 Duration of silence to detect speech stop (in milliseconds). Defaults
2898 to 500ms. With shorter values the model will respond more quickly,
2899 but may jump in on short pauses from the user.
2900
2901 - `threshold: Float`
2902
2903 Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A
2904 higher threshold will require louder audio to activate the model, and
2905 thus might perform better in noisy environments.
2906
2907 - `type: String`
2908
2909 Type of turn detection, only `server_vad` is currently supported.
2910
2911### Client Secret Create Response
2912
2913- `class ClientSecretCreateResponse`
2914
2915 Response from creating a session and client secret for the Realtime API.
2916
2917 - `expires_at: Integer`
2918
2919 Expiration timestamp for the client secret, in seconds since epoch.
2920
2921 - `session: RealtimeSessionCreateResponse | RealtimeTranscriptionSessionCreateResponse`
2922
2923 The session configuration for either a realtime or transcription session.
2924
2925 - `class RealtimeSessionCreateResponse`
2926
2927 A Realtime session configuration object.
2928
2929 - `id: String`
2930
2931 Unique identifier for the session that looks like `sess_1234567890abcdef`.
2932
2933 - `object: :"realtime.session"`
2934
2935 The object type. Always `realtime.session`.
2936
2937 - `:"realtime.session"`
2938
2939 - `type: :realtime`
2940
2941 The type of session to create. Always `realtime` for the Realtime API.
2942
2943 - `:realtime`
2944
2945 - `audio: Audio{ input, output}`
2946
2947 Configuration for input and output audio.
2948
2949 - `input: Input{ format_, noise_reduction, transcription, turn_detection}`
2950
2951 - `format_: RealtimeAudioFormats`
2952
2953 The format of the input audio.
2954
2955 - `class AudioPCM`
2956
2957 The PCM audio format. Only a 24kHz sample rate is supported.
2958
2959 - `rate: 24000`
2960
2961 The sample rate of the audio. Always `24000`.
2962
2963 - `24000`
2964
2965 - `type: :"audio/pcm"`
2966
2967 The audio format. Always `audio/pcm`.
2968
2969 - `:"audio/pcm"`
2970
2971 - `class AudioPCMU`
2972
2973 The G.711 μ-law format.
2974
2975 - `type: :"audio/pcmu"`
2976
2977 The audio format. Always `audio/pcmu`.
2978
2979 - `:"audio/pcmu"`
2980
2981 - `class AudioPCMA`
2982
2983 The G.711 A-law format.
2984
2985 - `type: :"audio/pcma"`
2986
2987 The audio format. Always `audio/pcma`.
2988
2989 - `:"audio/pcma"`
2990
2991 - `noise_reduction: NoiseReduction{ type}`
2992
2993 Configuration for input audio noise reduction. This can be set to `null` to turn off.
2994 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.
2995 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.
2996
2997 - `type: NoiseReductionType`
2998
2999 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.
3000
3001 - `:near_field`
3002
3003 - `:far_field`
3004
3005 - `transcription: AudioTranscription`
3006
3007 - `delay: :minimal | :low | :medium | 2 more`
3008
3009 Controls how long the model waits before emitting transcription text.
3010 Higher values can improve transcription accuracy at the cost of latency.
3011 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.
3012
3013 - `:minimal`
3014
3015 - `:low`
3016
3017 - `:medium`
3018
3019 - `:high`
3020
3021 - `:xhigh`
3022
3023 - `language: String`
3024
3025 The language of the input audio. Supplying the input language in
3026 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format
3027 will improve accuracy and latency.
3028
3029 - `model: String | :"whisper-1" | :"gpt-4o-mini-transcribe" | :"gpt-4o-mini-transcribe-2025-12-15" | 3 more`
3030
3031 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.
3032
3033 - `String = String`
3034
3035 - `Model = :"whisper-1" | :"gpt-4o-mini-transcribe" | :"gpt-4o-mini-transcribe-2025-12-15" | 3 more`
3036
3037 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.
3038
3039 - `:"whisper-1"`
3040
3041 - `:"gpt-4o-mini-transcribe"`
3042
3043 - `:"gpt-4o-mini-transcribe-2025-12-15"`
3044
3045 - `:"gpt-4o-transcribe"`
3046
3047 - `:"gpt-4o-transcribe-diarize"`
3048
3049 - `:"gpt-realtime-whisper"`
3050
3051 - `prompt: String`
3052
3053 An optional text to guide the model's style or continue a previous audio
3054 segment.
3055 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).
3056 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".
3057 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.
3058
3059 - `turn_detection: ServerVad{ type, create_response, idle_timeout_ms, 4 more} | SemanticVad{ type, create_response, eagerness, interrupt_response}`
3060
3061 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.
3062
3063 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.
3064
3065 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.
3066
3067 For `gpt-realtime-whisper` transcription sessions, turn detection must be
3068 set to `null`; VAD is not supported.
3069
3070 - `class ServerVad`
3071
3072 Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.
3073
3074 - `type: :server_vad`
3075
3076 Type of turn detection, `server_vad` to turn on simple Server VAD.
3077
3078 - `:server_vad`
3079
3080 - `create_response: bool`
3081
3082 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.
3083
3084 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.
3085
3086 - `idle_timeout_ms: Integer`
3087
3088 Optional timeout after which a model response will be triggered automatically. This is
3089 useful for situations in which a long pause from the user is unexpected, such as a phone
3090 call. The model will effectively prompt the user to continue the conversation based
3091 on the current context.
3092
3093 The timeout value will be applied after the last model response's audio has finished playing,
3094 i.e. it's set to the `response.done` time plus audio playback duration.
3095
3096 An `input_audio_buffer.timeout_triggered` event (plus events
3097 associated with the Response) will be emitted when the timeout is reached.
3098 Idle timeout is currently only supported for `server_vad` mode.
3099
3100 - `interrupt_response: bool`
3101
3102 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default
3103 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.
3104
3105 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.
3106
3107 - `prefix_padding_ms: Integer`
3108
3109 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in
3110 milliseconds). Defaults to 300ms.
3111
3112 - `silence_duration_ms: Integer`
3113
3114 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults
3115 to 500ms. With shorter values the model will respond more quickly,
3116 but may jump in on short pauses from the user.
3117
3118 - `threshold: Float`
3119
3120 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A
3121 higher threshold will require louder audio to activate the model, and
3122 thus might perform better in noisy environments.
3123
3124 - `class SemanticVad`
3125
3126 Server-side semantic turn detection which uses a model to determine when the user has finished speaking.
3127
3128 - `type: :semantic_vad`
3129
3130 Type of turn detection, `semantic_vad` to turn on Semantic VAD.
3131
3132 - `:semantic_vad`
3133
3134 - `create_response: bool`
3135
3136 Whether or not to automatically generate a response when a VAD stop event occurs.
3137
3138 - `eagerness: :low | :medium | :high | :auto`
3139
3140 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.
3141
3142 - `:low`
3143
3144 - `:medium`
3145
3146 - `:high`
3147
3148 - `:auto`
3149
3150 - `interrupt_response: bool`
3151
3152 Whether or not to automatically interrupt any ongoing response with output to the default
3153 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.
3154
3155 - `output: Output{ format_, speed, voice}`
3156
3157 - `format_: RealtimeAudioFormats`
3158
3159 The format of the output audio.
3160
3161 - `speed: Float`
3162
3163 The speed of the model's spoken response as a multiple of the original speed.
3164 1.0 is the default speed. 0.25 is the minimum speed. 1.5 is the maximum speed. This value can only be changed in between model turns, not while a response is in progress.
3165
3166 This parameter is a post-processing adjustment to the audio after it is generated, it's
3167 also possible to prompt the model to speak faster or slower.
3168
3169 - `voice: String | :alloy | :ash | :ballad | 7 more`
3170
3171 The voice the model uses to respond. Voice cannot be changed during the
3172 session once the model has responded with audio at least once. Current
3173 voice options are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`,
3174 `shimmer`, `verse`, `marin`, and `cedar`. We recommend `marin` and `cedar` for
3175 best quality.
3176
3177 - `String = String`
3178
3179 - `Voice = :alloy | :ash | :ballad | 7 more`
3180
3181 The voice the model uses to respond. Voice cannot be changed during the
3182 session once the model has responded with audio at least once. Current
3183 voice options are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`,
3184 `shimmer`, `verse`, `marin`, and `cedar`. We recommend `marin` and `cedar` for
3185 best quality.
3186
3187 - `:alloy`
3188
3189 - `:ash`
3190
3191 - `:ballad`
3192
3193 - `:coral`
3194
3195 - `:echo`
3196
3197 - `:sage`
3198
3199 - `:shimmer`
3200
3201 - `:verse`
3202
3203 - `:marin`
3204
3205 - `:cedar`
3206
3207 - `expires_at: Integer`
3208
3209 Expiration timestamp for the session, in seconds since epoch.
3210
3211 - `include: Array[:"item.input_audio_transcription.logprobs"]`
3212
3213 Additional fields to include in server outputs.
3214
3215 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.
3216
3217 - `:"item.input_audio_transcription.logprobs"`
3218
3219 - `instructions: String`
3220
3221 The default system instructions (i.e. system message) prepended to model calls. This field allows the client to guide the model on desired responses. The model can be instructed on response content and format, (e.g. "be extremely succinct", "act friendly", "here are examples of good responses") and on audio behavior (e.g. "talk quickly", "inject emotion into your voice", "laugh frequently"). The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behavior.
3222
3223 Note that the server sets default instructions which will be used if this field is not set and are visible in the `session.created` event at the start of the session.
3224
3225 - `max_output_tokens: Integer | :inf`
3226
3227 Maximum number of output tokens for a single assistant response,
3228 inclusive of tool calls. Provide an integer between 1 and 4096 to
3229 limit output tokens, or `inf` for the maximum available tokens for a
3230 given model. Defaults to `inf`.
3231
3232 - `Integer = Integer`
3233
3234 - `MaxOutputTokens = :inf`
3235
3236 - `:inf`
3237
3238 - `model: String | :"gpt-realtime" | :"gpt-realtime-1.5" | :"gpt-realtime-2" | 14 more`
3239
3240 The Realtime model used for this session.
3241
3242 - `String = String`
3243
3244 - `Model = :"gpt-realtime" | :"gpt-realtime-1.5" | :"gpt-realtime-2" | 14 more`
3245
3246 The Realtime model used for this session.
3247
3248 - `:"gpt-realtime"`
3249
3250 - `:"gpt-realtime-1.5"`
3251
3252 - `:"gpt-realtime-2"`
3253
3254 - `:"gpt-realtime-2025-08-28"`
3255
3256 - `:"gpt-4o-realtime-preview"`
3257
3258 - `:"gpt-4o-realtime-preview-2024-10-01"`
3259
3260 - `:"gpt-4o-realtime-preview-2024-12-17"`
3261
3262 - `:"gpt-4o-realtime-preview-2025-06-03"`
3263
3264 - `:"gpt-4o-mini-realtime-preview"`
3265
3266 - `:"gpt-4o-mini-realtime-preview-2024-12-17"`
3267
3268 - `:"gpt-realtime-mini"`
3269
3270 - `:"gpt-realtime-mini-2025-10-06"`
3271
3272 - `:"gpt-realtime-mini-2025-12-15"`
3273
3274 - `:"gpt-audio-1.5"`
3275
3276 - `:"gpt-audio-mini"`
3277
3278 - `:"gpt-audio-mini-2025-10-06"`
3279
3280 - `:"gpt-audio-mini-2025-12-15"`
3281
3282 - `output_modalities: Array[:text | :audio]`
3283
3284 The set of modalities the model can respond with. It defaults to `["audio"]`, indicating
3285 that the model will respond with audio plus a transcript. `["text"]` can be used to make
3286 the model respond with text only. It is not possible to request both `text` and `audio` at the same time.
3287
3288 - `:text`
3289
3290 - `:audio`
3291
3292 - `prompt: ResponsePrompt`
3293
3294 Reference to a prompt template and its variables.
3295 [Learn more](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts).
3296
3297 - `id: String`
3298
3299 The unique identifier of the prompt template to use.
3300
3301 - `variables: Hash[Symbol, String | ResponseInputText | ResponseInputImage | ResponseInputFile]`
3302
3303 Optional map of values to substitute in for variables in your
3304 prompt. The substitution values can either be strings, or other
3305 Response input types like images or files.
3306
3307 - `String = String`
3308
3309 - `class ResponseInputText`
3310
3311 A text input to the model.
3312
3313 - `text: String`
3314
3315 The text input to the model.
3316
3317 - `type: :input_text`
3318
3319 The type of the input item. Always `input_text`.
3320
3321 - `:input_text`
3322
3323 - `class ResponseInputImage`
3324
3325 An image input to the model. Learn about [image inputs](https://platform.openai.com/docs/guides/vision).
3326
3327 - `detail: :low | :high | :auto | :original`
3328
3329 The detail level of the image to be sent to the model. One of `high`, `low`, `auto`, or `original`. Defaults to `auto`.
3330
3331 - `:low`
3332
3333 - `:high`
3334
3335 - `:auto`
3336
3337 - `:original`
3338
3339 - `type: :input_image`
3340
3341 The type of the input item. Always `input_image`.
3342
3343 - `:input_image`
3344
3345 - `file_id: String`
3346
3347 The ID of the file to be sent to the model.
3348
3349 - `image_url: String`
3350
3351 The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.
3352
3353 - `class ResponseInputFile`
3354
3355 A file input to the model.
3356
3357 - `type: :input_file`
3358
3359 The type of the input item. Always `input_file`.
3360
3361 - `:input_file`
3362
3363 - `detail: :low | :high`
3364
3365 The detail level of the file to be sent to the model. Use `low` for the default rendering behavior, or `high` to render the file at higher quality. Defaults to `low`.
3366
3367 - `:low`
3368
3369 - `:high`
3370
3371 - `file_data: String`
3372
3373 The content of the file to be sent to the model.
3374
3375 - `file_id: String`
3376
3377 The ID of the file to be sent to the model.
3378
3379 - `file_url: String`
3380
3381 The URL of the file to be sent to the model.
3382
3383 - `filename: String`
3384
3385 The name of the file to be sent to the model.
3386
3387 - `version: String`
3388
3389 Optional version of the prompt template.
3390
3391 - `reasoning: RealtimeReasoning`
3392
3393 Configuration for reasoning-capable Realtime models such as `gpt-realtime-2`.
3394
3395 - `effort: RealtimeReasoningEffort`
3396
3397 Constrains effort on reasoning for reasoning-capable Realtime models such as
3398 `gpt-realtime-2`.
3399
3400 - `:minimal`
3401
3402 - `:low`
3403
3404 - `:medium`
3405
3406 - `:high`
3407
3408 - `:xhigh`
3409
3410 - `tool_choice: ToolChoiceOptions | ToolChoiceFunction | ToolChoiceMcp`
3411
3412 How the model chooses tools. Provide one of the string modes or force a specific
3413 function/MCP tool.
3414
3415 - `ToolChoiceOptions = :none | :auto | :required`
3416
3417 Controls which (if any) tool is called by the model.
3418
3419 `none` means the model will not call any tool and instead generates a message.
3420
3421 `auto` means the model can pick between generating a message or calling one or
3422 more tools.
3423
3424 `required` means the model must call one or more tools.
3425
3426 - `:none`
3427
3428 - `:auto`
3429
3430 - `:required`
3431
3432 - `class ToolChoiceFunction`
3433
3434 Use this option to force the model to call a specific function.
3435
3436 - `name: String`
3437
3438 The name of the function to call.
3439
3440 - `type: :function`
3441
3442 For function calling, the type is always `function`.
3443
3444 - `:function`
3445
3446 - `class ToolChoiceMcp`
3447
3448 Use this option to force the model to call a specific tool on a remote MCP server.
3449
3450 - `server_label: String`
3451
3452 The label of the MCP server to use.
3453
3454 - `type: :mcp`
3455
3456 For MCP tools, the type is always `mcp`.
3457
3458 - `:mcp`
3459
3460 - `name: String`
3461
3462 The name of the tool to call on the server.
3463
3464 - `tools: Array[RealtimeFunctionTool | McpTool{ server_label, type, allowed_tools, 7 more}]`
3465
3466 Tools available to the model.
3467
3468 - `class RealtimeFunctionTool`
3469
3470 - `description: String`
3471
3472 The description of the function, including guidance on when and how
3473 to call it, and guidance about what to tell the user when calling
3474 (if anything).
3475
3476 - `name: String`
3477
3478 The name of the function.
3479
3480 - `parameters: untyped`
3481
3482 Parameters of the function in JSON Schema.
3483
3484 - `type: :function`
3485
3486 The type of the tool, i.e. `function`.
3487
3488 - `:function`
3489
3490 - `class McpTool`
3491
3492 Give the model access to additional tools via remote Model Context Protocol
3493 (MCP) servers. [Learn more about MCP](https://platform.openai.com/docs/guides/tools-remote-mcp).
3494
3495 - `server_label: String`
3496
3497 A label for this MCP server, used to identify it in tool calls.
3498
3499 - `type: :mcp`
3500
3501 The type of the MCP tool. Always `mcp`.
3502
3503 - `:mcp`
3504
3505 - `allowed_tools: Array[String] | McpToolFilter{ read_only, tool_names}`
3506
3507 List of allowed tool names or a filter object.
3508
3509 - `McpAllowedTools = Array[String]`
3510
3511 A string array of allowed tool names
3512
3513 - `class McpToolFilter`
3514
3515 A filter object to specify which tools are allowed.
3516
3517 - `read_only: bool`
3518
3519 Indicates whether or not a tool modifies data or is read-only. If an
3520 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),
3521 it will match this filter.
3522
3523 - `tool_names: Array[String]`
3524
3525 List of allowed tool names.
3526
3527 - `authorization: String`
3528
3529 An OAuth access token that can be used with a remote MCP server, either
3530 with a custom MCP server URL or a service connector. Your application
3531 must handle the OAuth authorization flow and provide the token here.
3532
3533 - `connector_id: :connector_dropbox | :connector_gmail | :connector_googlecalendar | 5 more`
3534
3535 Identifier for service connectors, like those available in ChatGPT. One of
3536 `server_url` or `connector_id` must be provided. Learn more about service
3537 connectors [here](https://platform.openai.com/docs/guides/tools-remote-mcp#connectors).
3538
3539 Currently supported `connector_id` values are:
3540
3541 - Dropbox: `connector_dropbox`
3542 - Gmail: `connector_gmail`
3543 - Google Calendar: `connector_googlecalendar`
3544 - Google Drive: `connector_googledrive`
3545 - Microsoft Teams: `connector_microsoftteams`
3546 - Outlook Calendar: `connector_outlookcalendar`
3547 - Outlook Email: `connector_outlookemail`
3548 - SharePoint: `connector_sharepoint`
3549
3550 - `:connector_dropbox`
3551
3552 - `:connector_gmail`
3553
3554 - `:connector_googlecalendar`
3555
3556 - `:connector_googledrive`
3557
3558 - `:connector_microsoftteams`
3559
3560 - `:connector_outlookcalendar`
3561
3562 - `:connector_outlookemail`
3563
3564 - `:connector_sharepoint`
3565
3566 - `defer_loading: bool`
3567
3568 Whether this MCP tool is deferred and discovered via tool search.
3569
3570 - `headers: Hash[Symbol, String]`
3571
3572 Optional HTTP headers to send to the MCP server. Use for authentication
3573 or other purposes.
3574
3575 - `require_approval: McpToolApprovalFilter{ always, never} | :always | :never`
3576
3577 Specify which of the MCP server's tools require approval.
3578
3579 - `class McpToolApprovalFilter`
3580
3581 Specify which of the MCP server's tools require approval. Can be
3582 `always`, `never`, or a filter object associated with tools
3583 that require approval.
3584
3585 - `always: Always{ read_only, tool_names}`
3586
3587 A filter object to specify which tools are allowed.
3588
3589 - `read_only: bool`
3590
3591 Indicates whether or not a tool modifies data or is read-only. If an
3592 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),
3593 it will match this filter.
3594
3595 - `tool_names: Array[String]`
3596
3597 List of allowed tool names.
3598
3599 - `never: Never{ read_only, tool_names}`
3600
3601 A filter object to specify which tools are allowed.
3602
3603 - `read_only: bool`
3604
3605 Indicates whether or not a tool modifies data or is read-only. If an
3606 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),
3607 it will match this filter.
3608
3609 - `tool_names: Array[String]`
3610
3611 List of allowed tool names.
3612
3613 - `McpToolApprovalSetting = :always | :never`
3614
3615 Specify a single approval policy for all tools. One of `always` or
3616 `never`. When set to `always`, all tools will require approval. When
3617 set to `never`, all tools will not require approval.
3618
3619 - `:always`
3620
3621 - `:never`
3622
3623 - `server_description: String`
3624
3625 Optional description of the MCP server, used to provide more context.
3626
3627 - `server_url: String`
3628
3629 The URL for the MCP server. One of `server_url` or `connector_id` must be
3630 provided.
3631
3632 - `tracing: :auto | TracingConfiguration{ group_id, metadata, workflow_name}`
3633
3634 Realtime API can write session traces to the [Traces Dashboard](https://platform.openai.com/logs?api=traces). Set to null to disable tracing. Once
3635 tracing is enabled for a session, the configuration cannot be modified.
3636
3637 `auto` will create a trace for the session with default values for the
3638 workflow name, group id, and metadata.
3639
3640 - `Tracing = :auto`
3641
3642 Enables tracing and sets default values for tracing configuration options. Always `auto`.
3643
3644 - `:auto`
3645
3646 - `class TracingConfiguration`
3647
3648 Granular configuration for tracing.
3649
3650 - `group_id: String`
3651
3652 The group id to attach to this trace to enable filtering and
3653 grouping in the Traces Dashboard.
3654
3655 - `metadata: untyped`
3656
3657 The arbitrary metadata to attach to this trace to enable
3658 filtering in the Traces Dashboard.
3659
3660 - `workflow_name: String`
3661
3662 The name of the workflow to attach to this trace. This is used to
3663 name the trace in the Traces Dashboard.
3664
3665 - `truncation: RealtimeTruncation`
3666
3667 When the number of tokens in a conversation exceeds the model's input token limit, the conversation be truncated, meaning messages (starting from the oldest) will not be included in the model's context. A 32k context model with 4,096 max output tokens can only include 28,224 tokens in the context before truncation occurs.
3668
3669 Clients can configure truncation behavior to truncate with a lower max token limit, which is an effective way to control token usage and cost.
3670
3671 Truncation will reduce the number of cached tokens on the next turn (busting the cache), since messages are dropped from the beginning of the context. However, clients can also configure truncation to retain messages up to a fraction of the maximum context size, which will reduce the need for future truncations and thus improve the cache rate.
3672
3673 Truncation can be disabled entirely, which means the server will never truncate but would instead return an error if the conversation exceeds the model's input token limit.
3674
3675 - `RealtimeTruncationStrategy = :auto | :disabled`
3676
3677 The truncation strategy to use for the session. `auto` is the default truncation strategy. `disabled` will disable truncation and emit errors when the conversation exceeds the input token limit.
3678
3679 - `:auto`
3680
3681 - `:disabled`
3682
3683 - `class RealtimeTruncationRetentionRatio`
3684
3685 Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.
3686
3687 - `retention_ratio: Float`
3688
3689 Fraction of post-instruction conversation tokens to retain (`0.0` - `1.0`) when the conversation exceeds the input token limit. Setting this to `0.8` means that messages will be dropped until 80% of the maximum allowed tokens are used. This helps reduce the frequency of truncations and improve cache rates.
3690
3691 - `type: :retention_ratio`
3692
3693 Use retention ratio truncation.
3694
3695 - `:retention_ratio`
3696
3697 - `token_limits: TokenLimits{ post_instructions}`
3698
3699 Optional custom token limits for this truncation strategy. If not provided, the model's default token limits will be used.
3700
3701 - `post_instructions: Integer`
3702
3703 Maximum tokens allowed in the conversation after instructions (which including tool definitions). For example, setting this to 5,000 would mean that truncation would occur when the conversation exceeds 5,000 tokens after instructions. This cannot be higher than the model's context window size minus the maximum output tokens.
3704
3705 - `class RealtimeTranscriptionSessionCreateResponse`
3706
3707 A Realtime transcription session configuration object.
3708
3709 - `id: String`
3710
3711 Unique identifier for the session that looks like `sess_1234567890abcdef`.
3712
3713 - `object: String`
3714
3715 The object type. Always `realtime.transcription_session`.
3716
3717 - `type: :transcription`
3718
3719 The type of session. Always `transcription` for transcription sessions.
3720
3721 - `:transcription`
3722
3723 - `audio: Audio{ input}`
3724
3725 Configuration for input audio for the session.
3726
3727 - `input: Input{ format_, noise_reduction, transcription, turn_detection}`
3728
3729 - `format_: RealtimeAudioFormats`
3730
3731 The PCM audio format. Only a 24kHz sample rate is supported.
3732
3733 - `noise_reduction: NoiseReduction{ type}`
3734
3735 Configuration for input audio noise reduction.
3736
3737 - `type: NoiseReductionType`
3738
3739 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.
3740
3741 - `transcription: AudioTranscription`
3742
3743 - `turn_detection: RealtimeTranscriptionSessionTurnDetection`
3744
3745 Configuration for turn detection. Can be set to `null` to turn off. Server
3746 VAD means that the model will detect the start and end of speech based on
3747 audio volume and respond at the end of user speech. For `gpt-realtime-whisper`, this must be `null`; VAD is not supported.
3748
3749 - `prefix_padding_ms: Integer`
3750
3751 Amount of audio to include before the VAD detected speech (in
3752 milliseconds). Defaults to 300ms.
3753
3754 - `silence_duration_ms: Integer`
3755
3756 Duration of silence to detect speech stop (in milliseconds). Defaults
3757 to 500ms. With shorter values the model will respond more quickly,
3758 but may jump in on short pauses from the user.
3759
3760 - `threshold: Float`
3761
3762 Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A
3763 higher threshold will require louder audio to activate the model, and
3764 thus might perform better in noisy environments.
3765
3766 - `type: String`
3767
3768 Type of turn detection, only `server_vad` is currently supported.
3769
3770 - `expires_at: Integer`
3771
3772 Expiration timestamp for the session, in seconds since epoch.
3773
3774 - `include: Array[:"item.input_audio_transcription.logprobs"]`
3775
3776 Additional fields to include in server outputs.
3777
3778 - `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.
3779
3780 - `:"item.input_audio_transcription.logprobs"`
3781
3782 - `value: String`
3783
3784 The generated client secret value.