java/resources/realtime/subresources/client_secrets/index.md +0 −2803 deleted
File Deleted View Diff
1# Client Secrets
2
3## Create client secret
4
5`ClientSecretCreateResponse realtime().clientSecrets().create(ClientSecretCreateParamsparams = ClientSecretCreateParams.none(), RequestOptionsrequestOptions = RequestOptions.none())`
6
7**post** `/realtime/client_secrets`
8
9Create a Realtime client secret with an associated session configuration.
10
11Client secrets are short-lived tokens that can be passed to a client app,
12such as a web frontend or mobile client, which grants access to the Realtime API without
13leaking your main API key. You can configure a custom TTL for each client secret.
14
15You can also attach session configuration options to the client secret, which will be
16applied to any sessions created using that client secret, but these can also be overridden
17by the client connection.
18
19[Learn more about authentication with client secrets over WebRTC](https://platform.openai.com/docs/guides/realtime-webrtc).
20
21Returns the created client secret and the effective session object. The client secret is a string that looks like `ek_1234`.
22
23### Parameters
24
25- `ClientSecretCreateParams params`
26
27 - `Optional<ExpiresAfter> expiresAfter`
28
29 Configuration for the client secret expiration. Expiration refers to the time after which
30 a client secret will no longer be valid for creating sessions. The session itself may
31 continue after that time once started. A secret can be used to create multiple sessions
32 until it expires.
33
34 - `Optional<Anchor> anchor`
35
36 The anchor point for the client secret expiration, meaning that `seconds` will be added to the `created_at` time of the client secret to produce an expiration timestamp. Only `created_at` is currently supported.
37
38 - `CREATED_AT("created_at")`
39
40 - `Optional<Long> seconds`
41
42 The number of seconds from the anchor point to the expiration. Select a value between `10` and `7200` (2 hours). This default to 600 seconds (10 minutes) if not specified.
43
44 - `Optional<Session> session`
45
46 Session configuration to use for the client secret. Choose either a realtime
47 session or a transcription session.
48
49 - `class RealtimeSessionCreateRequest:`
50
51 Realtime session object configuration.
52
53 - `JsonValue; type "realtime"constant`
54
55 The type of session to create. Always `realtime` for the Realtime API.
56
57 - `REALTIME("realtime")`
58
59 - `Optional<RealtimeAudioConfig> audio`
60
61 Configuration for input and output audio.
62
63 - `Optional<RealtimeAudioConfigInput> input`
64
65 - `Optional<RealtimeAudioFormats> format`
66
67 The format of the input audio.
68
69 - `AudioPcm`
70
71 - `Optional<Rate> rate`
72
73 The sample rate of the audio. Always `24000`.
74
75 - `_24000(24000)`
76
77 - `Optional<Type> type`
78
79 The audio format. Always `audio/pcm`.
80
81 - `AUDIO_PCM("audio/pcm")`
82
83 - `AudioPcmu`
84
85 - `Optional<Type> type`
86
87 The audio format. Always `audio/pcmu`.
88
89 - `AUDIO_PCMU("audio/pcmu")`
90
91 - `AudioPcma`
92
93 - `Optional<Type> type`
94
95 The audio format. Always `audio/pcma`.
96
97 - `AUDIO_PCMA("audio/pcma")`
98
99 - `Optional<NoiseReduction> noiseReduction`
100
101 Configuration for input audio noise reduction. This can be set to `null` to turn off.
102 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.
103 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.
104
105 - `Optional<NoiseReductionType> type`
106
107 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.
108
109 - `NEAR_FIELD("near_field")`
110
111 - `FAR_FIELD("far_field")`
112
113 - `Optional<AudioTranscription> transcription`
114
115 Configuration for input audio transcription, defaults to off and can be set to `null` to turn off once on. Input audio transcription is not native to the model, since the model consumes audio directly. Transcription runs asynchronously through [the /audio/transcriptions endpoint](https://platform.openai.com/docs/api-reference/audio/createTranscription) and should be treated as guidance of input audio content rather than precisely what the model heard. The client can optionally set the language and prompt for transcription, these offer additional guidance to the transcription service.
116
117 - `Optional<Delay> delay`
118
119 Controls how long the model waits before emitting transcription text.
120 Higher values can improve transcription accuracy at the cost of latency.
121 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.
122
123 - `MINIMAL("minimal")`
124
125 - `LOW("low")`
126
127 - `MEDIUM("medium")`
128
129 - `HIGH("high")`
130
131 - `XHIGH("xhigh")`
132
133 - `Optional<String> language`
134
135 The language of the input audio. Supplying the input language in
136 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format
137 will improve accuracy and latency.
138
139 - `Optional<Model> model`
140
141 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.
142
143 - `WHISPER_1("whisper-1")`
144
145 - `GPT_4O_MINI_TRANSCRIBE("gpt-4o-mini-transcribe")`
146
147 - `GPT_4O_MINI_TRANSCRIBE_2025_12_15("gpt-4o-mini-transcribe-2025-12-15")`
148
149 - `GPT_4O_TRANSCRIBE("gpt-4o-transcribe")`
150
151 - `GPT_4O_TRANSCRIBE_DIARIZE("gpt-4o-transcribe-diarize")`
152
153 - `GPT_REALTIME_WHISPER("gpt-realtime-whisper")`
154
155 - `Optional<String> prompt`
156
157 An optional text to guide the model's style or continue a previous audio
158 segment.
159 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).
160 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".
161 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.
162
163 - `Optional<RealtimeAudioInputTurnDetection> turnDetection`
164
165 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.
166
167 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.
168
169 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.
170
171 For `gpt-realtime-whisper` transcription sessions, turn detection must be
172 set to `null`; VAD is not supported.
173
174 - `ServerVad`
175
176 - `JsonValue; type "server_vad"constant`
177
178 Type of turn detection, `server_vad` to turn on simple Server VAD.
179
180 - `SERVER_VAD("server_vad")`
181
182 - `Optional<Boolean> createResponse`
183
184 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.
185
186 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.
187
188 - `Optional<Long> idleTimeoutMs`
189
190 Optional timeout after which a model response will be triggered automatically. This is
191 useful for situations in which a long pause from the user is unexpected, such as a phone
192 call. The model will effectively prompt the user to continue the conversation based
193 on the current context.
194
195 The timeout value will be applied after the last model response's audio has finished playing,
196 i.e. it's set to the `response.done` time plus audio playback duration.
197
198 An `input_audio_buffer.timeout_triggered` event (plus events
199 associated with the Response) will be emitted when the timeout is reached.
200 Idle timeout is currently only supported for `server_vad` mode.
201
202 - `Optional<Boolean> interruptResponse`
203
204 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default
205 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.
206
207 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.
208
209 - `Optional<Long> prefixPaddingMs`
210
211 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in
212 milliseconds). Defaults to 300ms.
213
214 - `Optional<Long> silenceDurationMs`
215
216 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults
217 to 500ms. With shorter values the model will respond more quickly,
218 but may jump in on short pauses from the user.
219
220 - `Optional<Double> threshold`
221
222 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A
223 higher threshold will require louder audio to activate the model, and
224 thus might perform better in noisy environments.
225
226 - `SemanticVad`
227
228 - `JsonValue; type "semantic_vad"constant`
229
230 Type of turn detection, `semantic_vad` to turn on Semantic VAD.
231
232 - `SEMANTIC_VAD("semantic_vad")`
233
234 - `Optional<Boolean> createResponse`
235
236 Whether or not to automatically generate a response when a VAD stop event occurs.
237
238 - `Optional<Eagerness> eagerness`
239
240 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.
241
242 - `LOW("low")`
243
244 - `MEDIUM("medium")`
245
246 - `HIGH("high")`
247
248 - `AUTO("auto")`
249
250 - `Optional<Boolean> interruptResponse`
251
252 Whether or not to automatically interrupt any ongoing response with output to the default
253 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.
254
255 - `Optional<RealtimeAudioConfigOutput> output`
256
257 - `Optional<RealtimeAudioFormats> format`
258
259 The format of the output audio.
260
261 - `Optional<Double> speed`
262
263 The speed of the model's spoken response as a multiple of the original speed.
264 1.0 is the default speed. 0.25 is the minimum speed. 1.5 is the maximum speed. This value can only be changed in between model turns, not while a response is in progress.
265
266 This parameter is a post-processing adjustment to the audio after it is generated, it's
267 also possible to prompt the model to speak faster or slower.
268
269 - `Optional<Voice> voice`
270
271 The voice the model uses to respond. Supported built-in voices are
272 `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`, `shimmer`, `verse`,
273 `marin`, and `cedar`. You may also provide a custom voice object with
274 an `id`, for example `{ "id": "voice_1234" }`. Voice cannot be changed
275 during the session once the model has responded with audio at least once.
276 We recommend `marin` and `cedar` for best quality.
277
278 - `String`
279
280 - `enum UnionMember1:`
281
282 - `ALLOY("alloy")`
283
284 - `ASH("ash")`
285
286 - `BALLAD("ballad")`
287
288 - `CORAL("coral")`
289
290 - `ECHO("echo")`
291
292 - `SAGE("sage")`
293
294 - `SHIMMER("shimmer")`
295
296 - `VERSE("verse")`
297
298 - `MARIN("marin")`
299
300 - `CEDAR("cedar")`
301
302 - `class Id:`
303
304 Custom voice reference.
305
306 - `String id`
307
308 The custom voice ID, e.g. `voice_1234`.
309
310 - `Optional<List<Include>> include`
311
312 Additional fields to include in server outputs.
313
314 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.
315
316 - `ITEM_INPUT_AUDIO_TRANSCRIPTION_LOGPROBS("item.input_audio_transcription.logprobs")`
317
318 - `Optional<String> instructions`
319
320 The default system instructions (i.e. system message) prepended to model calls. This field allows the client to guide the model on desired responses. The model can be instructed on response content and format, (e.g. "be extremely succinct", "act friendly", "here are examples of good responses") and on audio behavior (e.g. "talk quickly", "inject emotion into your voice", "laugh frequently"). The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behavior.
321
322 Note that the server sets default instructions which will be used if this field is not set and are visible in the `session.created` event at the start of the session.
323
324 - `Optional<MaxOutputTokens> maxOutputTokens`
325
326 Maximum number of output tokens for a single assistant response,
327 inclusive of tool calls. Provide an integer between 1 and 4096 to
328 limit output tokens, or `inf` for the maximum available tokens for a
329 given model. Defaults to `inf`.
330
331 - `long`
332
333 - `JsonValue;`
334
335 - `INF("inf")`
336
337 - `Optional<Model> model`
338
339 The Realtime model used for this session.
340
341 - `GPT_REALTIME("gpt-realtime")`
342
343 - `GPT_REALTIME_1_5("gpt-realtime-1.5")`
344
345 - `GPT_REALTIME_2("gpt-realtime-2")`
346
347 - `GPT_REALTIME_2025_08_28("gpt-realtime-2025-08-28")`
348
349 - `GPT_4O_REALTIME_PREVIEW("gpt-4o-realtime-preview")`
350
351 - `GPT_4O_REALTIME_PREVIEW_2024_10_01("gpt-4o-realtime-preview-2024-10-01")`
352
353 - `GPT_4O_REALTIME_PREVIEW_2024_12_17("gpt-4o-realtime-preview-2024-12-17")`
354
355 - `GPT_4O_REALTIME_PREVIEW_2025_06_03("gpt-4o-realtime-preview-2025-06-03")`
356
357 - `GPT_4O_MINI_REALTIME_PREVIEW("gpt-4o-mini-realtime-preview")`
358
359 - `GPT_4O_MINI_REALTIME_PREVIEW_2024_12_17("gpt-4o-mini-realtime-preview-2024-12-17")`
360
361 - `GPT_REALTIME_MINI("gpt-realtime-mini")`
362
363 - `GPT_REALTIME_MINI_2025_10_06("gpt-realtime-mini-2025-10-06")`
364
365 - `GPT_REALTIME_MINI_2025_12_15("gpt-realtime-mini-2025-12-15")`
366
367 - `GPT_AUDIO_1_5("gpt-audio-1.5")`
368
369 - `GPT_AUDIO_MINI("gpt-audio-mini")`
370
371 - `GPT_AUDIO_MINI_2025_10_06("gpt-audio-mini-2025-10-06")`
372
373 - `GPT_AUDIO_MINI_2025_12_15("gpt-audio-mini-2025-12-15")`
374
375 - `Optional<List<OutputModality>> outputModalities`
376
377 The set of modalities the model can respond with. It defaults to `["audio"]`, indicating
378 that the model will respond with audio plus a transcript. `["text"]` can be used to make
379 the model respond with text only. It is not possible to request both `text` and `audio` at the same time.
380
381 - `TEXT("text")`
382
383 - `AUDIO("audio")`
384
385 - `Optional<Boolean> parallelToolCalls`
386
387 Whether the model may call multiple tools in parallel. Only supported by
388 reasoning Realtime models such as `gpt-realtime-2`.
389
390 - `Optional<ResponsePrompt> prompt`
391
392 Reference to a prompt template and its variables.
393 [Learn more](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts).
394
395 - `String id`
396
397 The unique identifier of the prompt template to use.
398
399 - `Optional<Variables> variables`
400
401 Optional map of values to substitute in for variables in your
402 prompt. The substitution values can either be strings, or other
403 Response input types like images or files.
404
405 - `String`
406
407 - `class ResponseInputText:`
408
409 A text input to the model.
410
411 - `String text`
412
413 The text input to the model.
414
415 - `JsonValue; type "input_text"constant`
416
417 The type of the input item. Always `input_text`.
418
419 - `INPUT_TEXT("input_text")`
420
421 - `class ResponseInputImage:`
422
423 An image input to the model. Learn about [image inputs](https://platform.openai.com/docs/guides/vision).
424
425 - `Detail detail`
426
427 The detail level of the image to be sent to the model. One of `high`, `low`, `auto`, or `original`. Defaults to `auto`.
428
429 - `LOW("low")`
430
431 - `HIGH("high")`
432
433 - `AUTO("auto")`
434
435 - `ORIGINAL("original")`
436
437 - `JsonValue; type "input_image"constant`
438
439 The type of the input item. Always `input_image`.
440
441 - `INPUT_IMAGE("input_image")`
442
443 - `Optional<String> fileId`
444
445 The ID of the file to be sent to the model.
446
447 - `Optional<String> imageUrl`
448
449 The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.
450
451 - `class ResponseInputFile:`
452
453 A file input to the model.
454
455 - `JsonValue; type "input_file"constant`
456
457 The type of the input item. Always `input_file`.
458
459 - `INPUT_FILE("input_file")`
460
461 - `Optional<Detail> detail`
462
463 The detail level of the file to be sent to the model. Use `low` for the default rendering behavior, or `high` to render the file at higher quality. Defaults to `low`.
464
465 - `LOW("low")`
466
467 - `HIGH("high")`
468
469 - `Optional<String> fileData`
470
471 The content of the file to be sent to the model.
472
473 - `Optional<String> fileId`
474
475 The ID of the file to be sent to the model.
476
477 - `Optional<String> fileUrl`
478
479 The URL of the file to be sent to the model.
480
481 - `Optional<String> filename`
482
483 The name of the file to be sent to the model.
484
485 - `Optional<String> version`
486
487 Optional version of the prompt template.
488
489 - `Optional<RealtimeReasoning> reasoning`
490
491 Configuration for reasoning-capable Realtime models such as `gpt-realtime-2`.
492
493 - `Optional<RealtimeReasoningEffort> effort`
494
495 Constrains effort on reasoning for reasoning-capable Realtime models such as
496 `gpt-realtime-2`.
497
498 - `MINIMAL("minimal")`
499
500 - `LOW("low")`
501
502 - `MEDIUM("medium")`
503
504 - `HIGH("high")`
505
506 - `XHIGH("xhigh")`
507
508 - `Optional<RealtimeToolChoiceConfig> toolChoice`
509
510 How the model chooses tools. Provide one of the string modes or force a specific
511 function/MCP tool.
512
513 - `enum ToolChoiceOptions:`
514
515 Controls which (if any) tool is called by the model.
516
517 `none` means the model will not call any tool and instead generates a message.
518
519 `auto` means the model can pick between generating a message or calling one or
520 more tools.
521
522 `required` means the model must call one or more tools.
523
524 - `NONE("none")`
525
526 - `AUTO("auto")`
527
528 - `REQUIRED("required")`
529
530 - `class ToolChoiceFunction:`
531
532 Use this option to force the model to call a specific function.
533
534 - `String name`
535
536 The name of the function to call.
537
538 - `JsonValue; type "function"constant`
539
540 For function calling, the type is always `function`.
541
542 - `FUNCTION("function")`
543
544 - `class ToolChoiceMcp:`
545
546 Use this option to force the model to call a specific tool on a remote MCP server.
547
548 - `String serverLabel`
549
550 The label of the MCP server to use.
551
552 - `JsonValue; type "mcp"constant`
553
554 For MCP tools, the type is always `mcp`.
555
556 - `MCP("mcp")`
557
558 - `Optional<String> name`
559
560 The name of the tool to call on the server.
561
562 - `Optional<List<RealtimeToolsConfigUnion>> tools`
563
564 Tools available to the model.
565
566 - `class RealtimeFunctionTool:`
567
568 - `Optional<String> description`
569
570 The description of the function, including guidance on when and how
571 to call it, and guidance about what to tell the user when calling
572 (if anything).
573
574 - `Optional<String> name`
575
576 The name of the function.
577
578 - `Optional<JsonValue> parameters`
579
580 Parameters of the function in JSON Schema.
581
582 - `Optional<Type> type`
583
584 The type of the tool, i.e. `function`.
585
586 - `FUNCTION("function")`
587
588 - `Mcp`
589
590 - `String serverLabel`
591
592 A label for this MCP server, used to identify it in tool calls.
593
594 - `JsonValue; type "mcp"constant`
595
596 The type of the MCP tool. Always `mcp`.
597
598 - `MCP("mcp")`
599
600 - `Optional<AllowedTools> allowedTools`
601
602 List of allowed tool names or a filter object.
603
604 - `List<String>`
605
606 - `class McpToolFilter:`
607
608 A filter object to specify which tools are allowed.
609
610 - `Optional<Boolean> readOnly`
611
612 Indicates whether or not a tool modifies data or is read-only. If an
613 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),
614 it will match this filter.
615
616 - `Optional<List<String>> toolNames`
617
618 List of allowed tool names.
619
620 - `Optional<String> authorization`
621
622 An OAuth access token that can be used with a remote MCP server, either
623 with a custom MCP server URL or a service connector. Your application
624 must handle the OAuth authorization flow and provide the token here.
625
626 - `Optional<ConnectorId> connectorId`
627
628 Identifier for service connectors, like those available in ChatGPT. One of
629 `server_url` or `connector_id` must be provided. Learn more about service
630 connectors [here](https://platform.openai.com/docs/guides/tools-remote-mcp#connectors).
631
632 Currently supported `connector_id` values are:
633
634 - Dropbox: `connector_dropbox`
635 - Gmail: `connector_gmail`
636 - Google Calendar: `connector_googlecalendar`
637 - Google Drive: `connector_googledrive`
638 - Microsoft Teams: `connector_microsoftteams`
639 - Outlook Calendar: `connector_outlookcalendar`
640 - Outlook Email: `connector_outlookemail`
641 - SharePoint: `connector_sharepoint`
642
643 - `CONNECTOR_DROPBOX("connector_dropbox")`
644
645 - `CONNECTOR_GMAIL("connector_gmail")`
646
647 - `CONNECTOR_GOOGLECALENDAR("connector_googlecalendar")`
648
649 - `CONNECTOR_GOOGLEDRIVE("connector_googledrive")`
650
651 - `CONNECTOR_MICROSOFTTEAMS("connector_microsoftteams")`
652
653 - `CONNECTOR_OUTLOOKCALENDAR("connector_outlookcalendar")`
654
655 - `CONNECTOR_OUTLOOKEMAIL("connector_outlookemail")`
656
657 - `CONNECTOR_SHAREPOINT("connector_sharepoint")`
658
659 - `Optional<Boolean> deferLoading`
660
661 Whether this MCP tool is deferred and discovered via tool search.
662
663 - `Optional<Headers> headers`
664
665 Optional HTTP headers to send to the MCP server. Use for authentication
666 or other purposes.
667
668 - `Optional<RequireApproval> requireApproval`
669
670 Specify which of the MCP server's tools require approval.
671
672 - `class McpToolApprovalFilter:`
673
674 Specify which of the MCP server's tools require approval. Can be
675 `always`, `never`, or a filter object associated with tools
676 that require approval.
677
678 - `Optional<Always> always`
679
680 A filter object to specify which tools are allowed.
681
682 - `Optional<Boolean> readOnly`
683
684 Indicates whether or not a tool modifies data or is read-only. If an
685 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),
686 it will match this filter.
687
688 - `Optional<List<String>> toolNames`
689
690 List of allowed tool names.
691
692 - `Optional<Never> never`
693
694 A filter object to specify which tools are allowed.
695
696 - `Optional<Boolean> readOnly`
697
698 Indicates whether or not a tool modifies data or is read-only. If an
699 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),
700 it will match this filter.
701
702 - `Optional<List<String>> toolNames`
703
704 List of allowed tool names.
705
706 - `enum McpToolApprovalSetting:`
707
708 Specify a single approval policy for all tools. One of `always` or
709 `never`. When set to `always`, all tools will require approval. When
710 set to `never`, all tools will not require approval.
711
712 - `ALWAYS("always")`
713
714 - `NEVER("never")`
715
716 - `Optional<String> serverDescription`
717
718 Optional description of the MCP server, used to provide more context.
719
720 - `Optional<String> serverUrl`
721
722 The URL for the MCP server. One of `server_url` or `connector_id` must be
723 provided.
724
725 - `Optional<RealtimeTracingConfig> tracing`
726
727 Realtime API can write session traces to the [Traces Dashboard](https://platform.openai.com/logs?api=traces). Set to null to disable tracing. Once
728 tracing is enabled for a session, the configuration cannot be modified.
729
730 `auto` will create a trace for the session with default values for the
731 workflow name, group id, and metadata.
732
733 - `JsonValue;`
734
735 - `AUTO("auto")`
736
737 - `TracingConfiguration`
738
739 - `Optional<String> groupId`
740
741 The group id to attach to this trace to enable filtering and
742 grouping in the Traces Dashboard.
743
744 - `Optional<JsonValue> metadata`
745
746 The arbitrary metadata to attach to this trace to enable
747 filtering in the Traces Dashboard.
748
749 - `Optional<String> workflowName`
750
751 The name of the workflow to attach to this trace. This is used to
752 name the trace in the Traces Dashboard.
753
754 - `Optional<RealtimeTruncation> truncation`
755
756 When the number of tokens in a conversation exceeds the model's input token limit, the conversation be truncated, meaning messages (starting from the oldest) will not be included in the model's context. A 32k context model with 4,096 max output tokens can only include 28,224 tokens in the context before truncation occurs.
757
758 Clients can configure truncation behavior to truncate with a lower max token limit, which is an effective way to control token usage and cost.
759
760 Truncation will reduce the number of cached tokens on the next turn (busting the cache), since messages are dropped from the beginning of the context. However, clients can also configure truncation to retain messages up to a fraction of the maximum context size, which will reduce the need for future truncations and thus improve the cache rate.
761
762 Truncation can be disabled entirely, which means the server will never truncate but would instead return an error if the conversation exceeds the model's input token limit.
763
764 - `RealtimeTruncationStrategy`
765
766 - `AUTO("auto")`
767
768 - `DISABLED("disabled")`
769
770 - `class RealtimeTruncationRetentionRatio:`
771
772 Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.
773
774 - `double retentionRatio`
775
776 Fraction of post-instruction conversation tokens to retain (`0.0` - `1.0`) when the conversation exceeds the input token limit. Setting this to `0.8` means that messages will be dropped until 80% of the maximum allowed tokens are used. This helps reduce the frequency of truncations and improve cache rates.
777
778 - `JsonValue; type "retention_ratio"constant`
779
780 Use retention ratio truncation.
781
782 - `RETENTION_RATIO("retention_ratio")`
783
784 - `Optional<TokenLimits> tokenLimits`
785
786 Optional custom token limits for this truncation strategy. If not provided, the model's default token limits will be used.
787
788 - `Optional<Long> postInstructions`
789
790 Maximum tokens allowed in the conversation after instructions (which including tool definitions). For example, setting this to 5,000 would mean that truncation would occur when the conversation exceeds 5,000 tokens after instructions. This cannot be higher than the model's context window size minus the maximum output tokens.
791
792 - `class RealtimeTranscriptionSessionCreateRequest:`
793
794 Realtime transcription session object configuration.
795
796 - `JsonValue; type "transcription"constant`
797
798 The type of session to create. Always `transcription` for transcription sessions.
799
800 - `TRANSCRIPTION("transcription")`
801
802 - `Optional<RealtimeTranscriptionSessionAudio> audio`
803
804 Configuration for input and output audio.
805
806 - `Optional<RealtimeTranscriptionSessionAudioInput> input`
807
808 - `Optional<RealtimeAudioFormats> format`
809
810 The PCM audio format. Only a 24kHz sample rate is supported.
811
812 - `Optional<NoiseReduction> noiseReduction`
813
814 Configuration for input audio noise reduction. This can be set to `null` to turn off.
815 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.
816 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.
817
818 - `Optional<NoiseReductionType> type`
819
820 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.
821
822 - `Optional<AudioTranscription> transcription`
823
824 Configuration for input audio transcription, defaults to off and can be set to `null` to turn off once on. Input audio transcription is not native to the model, since the model consumes audio directly. Transcription runs asynchronously through [the /audio/transcriptions endpoint](https://platform.openai.com/docs/api-reference/audio/createTranscription) and should be treated as guidance of input audio content rather than precisely what the model heard. The client can optionally set the language and prompt for transcription, these offer additional guidance to the transcription service.
825
826 - `Optional<RealtimeTranscriptionSessionAudioInputTurnDetection> turnDetection`
827
828 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.
829
830 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.
831
832 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.
833
834 For `gpt-realtime-whisper` transcription sessions, turn detection must be
835 set to `null`; VAD is not supported.
836
837 - `ServerVad`
838
839 - `JsonValue; type "server_vad"constant`
840
841 Type of turn detection, `server_vad` to turn on simple Server VAD.
842
843 - `SERVER_VAD("server_vad")`
844
845 - `Optional<Boolean> createResponse`
846
847 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.
848
849 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.
850
851 - `Optional<Long> idleTimeoutMs`
852
853 Optional timeout after which a model response will be triggered automatically. This is
854 useful for situations in which a long pause from the user is unexpected, such as a phone
855 call. The model will effectively prompt the user to continue the conversation based
856 on the current context.
857
858 The timeout value will be applied after the last model response's audio has finished playing,
859 i.e. it's set to the `response.done` time plus audio playback duration.
860
861 An `input_audio_buffer.timeout_triggered` event (plus events
862 associated with the Response) will be emitted when the timeout is reached.
863 Idle timeout is currently only supported for `server_vad` mode.
864
865 - `Optional<Boolean> interruptResponse`
866
867 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default
868 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.
869
870 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.
871
872 - `Optional<Long> prefixPaddingMs`
873
874 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in
875 milliseconds). Defaults to 300ms.
876
877 - `Optional<Long> silenceDurationMs`
878
879 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults
880 to 500ms. With shorter values the model will respond more quickly,
881 but may jump in on short pauses from the user.
882
883 - `Optional<Double> threshold`
884
885 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A
886 higher threshold will require louder audio to activate the model, and
887 thus might perform better in noisy environments.
888
889 - `SemanticVad`
890
891 - `JsonValue; type "semantic_vad"constant`
892
893 Type of turn detection, `semantic_vad` to turn on Semantic VAD.
894
895 - `SEMANTIC_VAD("semantic_vad")`
896
897 - `Optional<Boolean> createResponse`
898
899 Whether or not to automatically generate a response when a VAD stop event occurs.
900
901 - `Optional<Eagerness> eagerness`
902
903 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.
904
905 - `LOW("low")`
906
907 - `MEDIUM("medium")`
908
909 - `HIGH("high")`
910
911 - `AUTO("auto")`
912
913 - `Optional<Boolean> interruptResponse`
914
915 Whether or not to automatically interrupt any ongoing response with output to the default
916 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.
917
918 - `Optional<List<Include>> include`
919
920 Additional fields to include in server outputs.
921
922 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.
923
924 - `ITEM_INPUT_AUDIO_TRANSCRIPTION_LOGPROBS("item.input_audio_transcription.logprobs")`
925
926### Returns
927
928- `class ClientSecretCreateResponse:`
929
930 Response from creating a session and client secret for the Realtime API.
931
932 - `long expiresAt`
933
934 Expiration timestamp for the client secret, in seconds since epoch.
935
936 - `Session session`
937
938 The session configuration for either a realtime or transcription session.
939
940 - `class RealtimeSessionCreateResponse:`
941
942 A Realtime session configuration object.
943
944 - `String id`
945
946 Unique identifier for the session that looks like `sess_1234567890abcdef`.
947
948 - `JsonValue; object_ "realtime.session"constant`
949
950 The object type. Always `realtime.session`.
951
952 - `REALTIME_SESSION("realtime.session")`
953
954 - `JsonValue; type "realtime"constant`
955
956 The type of session to create. Always `realtime` for the Realtime API.
957
958 - `REALTIME("realtime")`
959
960 - `Optional<Audio> audio`
961
962 Configuration for input and output audio.
963
964 - `Optional<Input> input`
965
966 - `Optional<RealtimeAudioFormats> format`
967
968 The format of the input audio.
969
970 - `AudioPcm`
971
972 - `Optional<Rate> rate`
973
974 The sample rate of the audio. Always `24000`.
975
976 - `_24000(24000)`
977
978 - `Optional<Type> type`
979
980 The audio format. Always `audio/pcm`.
981
982 - `AUDIO_PCM("audio/pcm")`
983
984 - `AudioPcmu`
985
986 - `Optional<Type> type`
987
988 The audio format. Always `audio/pcmu`.
989
990 - `AUDIO_PCMU("audio/pcmu")`
991
992 - `AudioPcma`
993
994 - `Optional<Type> type`
995
996 The audio format. Always `audio/pcma`.
997
998 - `AUDIO_PCMA("audio/pcma")`
999
1000 - `Optional<NoiseReduction> noiseReduction`
1001
1002 Configuration for input audio noise reduction. This can be set to `null` to turn off.
1003 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.
1004 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.
1005
1006 - `Optional<NoiseReductionType> type`
1007
1008 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.
1009
1010 - `NEAR_FIELD("near_field")`
1011
1012 - `FAR_FIELD("far_field")`
1013
1014 - `Optional<AudioTranscription> transcription`
1015
1016 - `Optional<Delay> delay`
1017
1018 Controls how long the model waits before emitting transcription text.
1019 Higher values can improve transcription accuracy at the cost of latency.
1020 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.
1021
1022 - `MINIMAL("minimal")`
1023
1024 - `LOW("low")`
1025
1026 - `MEDIUM("medium")`
1027
1028 - `HIGH("high")`
1029
1030 - `XHIGH("xhigh")`
1031
1032 - `Optional<String> language`
1033
1034 The language of the input audio. Supplying the input language in
1035 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format
1036 will improve accuracy and latency.
1037
1038 - `Optional<Model> model`
1039
1040 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.
1041
1042 - `WHISPER_1("whisper-1")`
1043
1044 - `GPT_4O_MINI_TRANSCRIBE("gpt-4o-mini-transcribe")`
1045
1046 - `GPT_4O_MINI_TRANSCRIBE_2025_12_15("gpt-4o-mini-transcribe-2025-12-15")`
1047
1048 - `GPT_4O_TRANSCRIBE("gpt-4o-transcribe")`
1049
1050 - `GPT_4O_TRANSCRIBE_DIARIZE("gpt-4o-transcribe-diarize")`
1051
1052 - `GPT_REALTIME_WHISPER("gpt-realtime-whisper")`
1053
1054 - `Optional<String> prompt`
1055
1056 An optional text to guide the model's style or continue a previous audio
1057 segment.
1058 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).
1059 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".
1060 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.
1061
1062 - `Optional<TurnDetection> turnDetection`
1063
1064 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.
1065
1066 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.
1067
1068 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.
1069
1070 For `gpt-realtime-whisper` transcription sessions, turn detection must be
1071 set to `null`; VAD is not supported.
1072
1073 - `class ServerVad:`
1074
1075 Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.
1076
1077 - `JsonValue; type "server_vad"constant`
1078
1079 Type of turn detection, `server_vad` to turn on simple Server VAD.
1080
1081 - `SERVER_VAD("server_vad")`
1082
1083 - `Optional<Boolean> createResponse`
1084
1085 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.
1086
1087 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.
1088
1089 - `Optional<Long> idleTimeoutMs`
1090
1091 Optional timeout after which a model response will be triggered automatically. This is
1092 useful for situations in which a long pause from the user is unexpected, such as a phone
1093 call. The model will effectively prompt the user to continue the conversation based
1094 on the current context.
1095
1096 The timeout value will be applied after the last model response's audio has finished playing,
1097 i.e. it's set to the `response.done` time plus audio playback duration.
1098
1099 An `input_audio_buffer.timeout_triggered` event (plus events
1100 associated with the Response) will be emitted when the timeout is reached.
1101 Idle timeout is currently only supported for `server_vad` mode.
1102
1103 - `Optional<Boolean> interruptResponse`
1104
1105 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default
1106 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.
1107
1108 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.
1109
1110 - `Optional<Long> prefixPaddingMs`
1111
1112 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in
1113 milliseconds). Defaults to 300ms.
1114
1115 - `Optional<Long> silenceDurationMs`
1116
1117 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults
1118 to 500ms. With shorter values the model will respond more quickly,
1119 but may jump in on short pauses from the user.
1120
1121 - `Optional<Double> threshold`
1122
1123 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A
1124 higher threshold will require louder audio to activate the model, and
1125 thus might perform better in noisy environments.
1126
1127 - `class SemanticVad:`
1128
1129 Server-side semantic turn detection which uses a model to determine when the user has finished speaking.
1130
1131 - `JsonValue; type "semantic_vad"constant`
1132
1133 Type of turn detection, `semantic_vad` to turn on Semantic VAD.
1134
1135 - `SEMANTIC_VAD("semantic_vad")`
1136
1137 - `Optional<Boolean> createResponse`
1138
1139 Whether or not to automatically generate a response when a VAD stop event occurs.
1140
1141 - `Optional<Eagerness> eagerness`
1142
1143 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.
1144
1145 - `LOW("low")`
1146
1147 - `MEDIUM("medium")`
1148
1149 - `HIGH("high")`
1150
1151 - `AUTO("auto")`
1152
1153 - `Optional<Boolean> interruptResponse`
1154
1155 Whether or not to automatically interrupt any ongoing response with output to the default
1156 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.
1157
1158 - `Optional<Output> output`
1159
1160 - `Optional<RealtimeAudioFormats> format`
1161
1162 The format of the output audio.
1163
1164 - `Optional<Double> speed`
1165
1166 The speed of the model's spoken response as a multiple of the original speed.
1167 1.0 is the default speed. 0.25 is the minimum speed. 1.5 is the maximum speed. This value can only be changed in between model turns, not while a response is in progress.
1168
1169 This parameter is a post-processing adjustment to the audio after it is generated, it's
1170 also possible to prompt the model to speak faster or slower.
1171
1172 - `Optional<Voice> voice`
1173
1174 The voice the model uses to respond. Voice cannot be changed during the
1175 session once the model has responded with audio at least once. Current
1176 voice options are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`,
1177 `shimmer`, `verse`, `marin`, and `cedar`. We recommend `marin` and `cedar` for
1178 best quality.
1179
1180 - `ALLOY("alloy")`
1181
1182 - `ASH("ash")`
1183
1184 - `BALLAD("ballad")`
1185
1186 - `CORAL("coral")`
1187
1188 - `ECHO("echo")`
1189
1190 - `SAGE("sage")`
1191
1192 - `SHIMMER("shimmer")`
1193
1194 - `VERSE("verse")`
1195
1196 - `MARIN("marin")`
1197
1198 - `CEDAR("cedar")`
1199
1200 - `Optional<Long> expiresAt`
1201
1202 Expiration timestamp for the session, in seconds since epoch.
1203
1204 - `Optional<List<Include>> include`
1205
1206 Additional fields to include in server outputs.
1207
1208 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.
1209
1210 - `ITEM_INPUT_AUDIO_TRANSCRIPTION_LOGPROBS("item.input_audio_transcription.logprobs")`
1211
1212 - `Optional<String> instructions`
1213
1214 The default system instructions (i.e. system message) prepended to model calls. This field allows the client to guide the model on desired responses. The model can be instructed on response content and format, (e.g. "be extremely succinct", "act friendly", "here are examples of good responses") and on audio behavior (e.g. "talk quickly", "inject emotion into your voice", "laugh frequently"). The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behavior.
1215
1216 Note that the server sets default instructions which will be used if this field is not set and are visible in the `session.created` event at the start of the session.
1217
1218 - `Optional<MaxOutputTokens> maxOutputTokens`
1219
1220 Maximum number of output tokens for a single assistant response,
1221 inclusive of tool calls. Provide an integer between 1 and 4096 to
1222 limit output tokens, or `inf` for the maximum available tokens for a
1223 given model. Defaults to `inf`.
1224
1225 - `long`
1226
1227 - `JsonValue;`
1228
1229 - `INF("inf")`
1230
1231 - `Optional<Model> model`
1232
1233 The Realtime model used for this session.
1234
1235 - `GPT_REALTIME("gpt-realtime")`
1236
1237 - `GPT_REALTIME_1_5("gpt-realtime-1.5")`
1238
1239 - `GPT_REALTIME_2("gpt-realtime-2")`
1240
1241 - `GPT_REALTIME_2025_08_28("gpt-realtime-2025-08-28")`
1242
1243 - `GPT_4O_REALTIME_PREVIEW("gpt-4o-realtime-preview")`
1244
1245 - `GPT_4O_REALTIME_PREVIEW_2024_10_01("gpt-4o-realtime-preview-2024-10-01")`
1246
1247 - `GPT_4O_REALTIME_PREVIEW_2024_12_17("gpt-4o-realtime-preview-2024-12-17")`
1248
1249 - `GPT_4O_REALTIME_PREVIEW_2025_06_03("gpt-4o-realtime-preview-2025-06-03")`
1250
1251 - `GPT_4O_MINI_REALTIME_PREVIEW("gpt-4o-mini-realtime-preview")`
1252
1253 - `GPT_4O_MINI_REALTIME_PREVIEW_2024_12_17("gpt-4o-mini-realtime-preview-2024-12-17")`
1254
1255 - `GPT_REALTIME_MINI("gpt-realtime-mini")`
1256
1257 - `GPT_REALTIME_MINI_2025_10_06("gpt-realtime-mini-2025-10-06")`
1258
1259 - `GPT_REALTIME_MINI_2025_12_15("gpt-realtime-mini-2025-12-15")`
1260
1261 - `GPT_AUDIO_1_5("gpt-audio-1.5")`
1262
1263 - `GPT_AUDIO_MINI("gpt-audio-mini")`
1264
1265 - `GPT_AUDIO_MINI_2025_10_06("gpt-audio-mini-2025-10-06")`
1266
1267 - `GPT_AUDIO_MINI_2025_12_15("gpt-audio-mini-2025-12-15")`
1268
1269 - `Optional<List<OutputModality>> outputModalities`
1270
1271 The set of modalities the model can respond with. It defaults to `["audio"]`, indicating
1272 that the model will respond with audio plus a transcript. `["text"]` can be used to make
1273 the model respond with text only. It is not possible to request both `text` and `audio` at the same time.
1274
1275 - `TEXT("text")`
1276
1277 - `AUDIO("audio")`
1278
1279 - `Optional<ResponsePrompt> prompt`
1280
1281 Reference to a prompt template and its variables.
1282 [Learn more](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts).
1283
1284 - `String id`
1285
1286 The unique identifier of the prompt template to use.
1287
1288 - `Optional<Variables> variables`
1289
1290 Optional map of values to substitute in for variables in your
1291 prompt. The substitution values can either be strings, or other
1292 Response input types like images or files.
1293
1294 - `String`
1295
1296 - `class ResponseInputText:`
1297
1298 A text input to the model.
1299
1300 - `String text`
1301
1302 The text input to the model.
1303
1304 - `JsonValue; type "input_text"constant`
1305
1306 The type of the input item. Always `input_text`.
1307
1308 - `INPUT_TEXT("input_text")`
1309
1310 - `class ResponseInputImage:`
1311
1312 An image input to the model. Learn about [image inputs](https://platform.openai.com/docs/guides/vision).
1313
1314 - `Detail detail`
1315
1316 The detail level of the image to be sent to the model. One of `high`, `low`, `auto`, or `original`. Defaults to `auto`.
1317
1318 - `LOW("low")`
1319
1320 - `HIGH("high")`
1321
1322 - `AUTO("auto")`
1323
1324 - `ORIGINAL("original")`
1325
1326 - `JsonValue; type "input_image"constant`
1327
1328 The type of the input item. Always `input_image`.
1329
1330 - `INPUT_IMAGE("input_image")`
1331
1332 - `Optional<String> fileId`
1333
1334 The ID of the file to be sent to the model.
1335
1336 - `Optional<String> imageUrl`
1337
1338 The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.
1339
1340 - `class ResponseInputFile:`
1341
1342 A file input to the model.
1343
1344 - `JsonValue; type "input_file"constant`
1345
1346 The type of the input item. Always `input_file`.
1347
1348 - `INPUT_FILE("input_file")`
1349
1350 - `Optional<Detail> detail`
1351
1352 The detail level of the file to be sent to the model. Use `low` for the default rendering behavior, or `high` to render the file at higher quality. Defaults to `low`.
1353
1354 - `LOW("low")`
1355
1356 - `HIGH("high")`
1357
1358 - `Optional<String> fileData`
1359
1360 The content of the file to be sent to the model.
1361
1362 - `Optional<String> fileId`
1363
1364 The ID of the file to be sent to the model.
1365
1366 - `Optional<String> fileUrl`
1367
1368 The URL of the file to be sent to the model.
1369
1370 - `Optional<String> filename`
1371
1372 The name of the file to be sent to the model.
1373
1374 - `Optional<String> version`
1375
1376 Optional version of the prompt template.
1377
1378 - `Optional<RealtimeReasoning> reasoning`
1379
1380 Configuration for reasoning-capable Realtime models such as `gpt-realtime-2`.
1381
1382 - `Optional<RealtimeReasoningEffort> effort`
1383
1384 Constrains effort on reasoning for reasoning-capable Realtime models such as
1385 `gpt-realtime-2`.
1386
1387 - `MINIMAL("minimal")`
1388
1389 - `LOW("low")`
1390
1391 - `MEDIUM("medium")`
1392
1393 - `HIGH("high")`
1394
1395 - `XHIGH("xhigh")`
1396
1397 - `Optional<ToolChoice> toolChoice`
1398
1399 How the model chooses tools. Provide one of the string modes or force a specific
1400 function/MCP tool.
1401
1402 - `enum ToolChoiceOptions:`
1403
1404 Controls which (if any) tool is called by the model.
1405
1406 `none` means the model will not call any tool and instead generates a message.
1407
1408 `auto` means the model can pick between generating a message or calling one or
1409 more tools.
1410
1411 `required` means the model must call one or more tools.
1412
1413 - `NONE("none")`
1414
1415 - `AUTO("auto")`
1416
1417 - `REQUIRED("required")`
1418
1419 - `class ToolChoiceFunction:`
1420
1421 Use this option to force the model to call a specific function.
1422
1423 - `String name`
1424
1425 The name of the function to call.
1426
1427 - `JsonValue; type "function"constant`
1428
1429 For function calling, the type is always `function`.
1430
1431 - `FUNCTION("function")`
1432
1433 - `class ToolChoiceMcp:`
1434
1435 Use this option to force the model to call a specific tool on a remote MCP server.
1436
1437 - `String serverLabel`
1438
1439 The label of the MCP server to use.
1440
1441 - `JsonValue; type "mcp"constant`
1442
1443 For MCP tools, the type is always `mcp`.
1444
1445 - `MCP("mcp")`
1446
1447 - `Optional<String> name`
1448
1449 The name of the tool to call on the server.
1450
1451 - `Optional<List<Tool>> tools`
1452
1453 Tools available to the model.
1454
1455 - `class RealtimeFunctionTool:`
1456
1457 - `Optional<String> description`
1458
1459 The description of the function, including guidance on when and how
1460 to call it, and guidance about what to tell the user when calling
1461 (if anything).
1462
1463 - `Optional<String> name`
1464
1465 The name of the function.
1466
1467 - `Optional<JsonValue> parameters`
1468
1469 Parameters of the function in JSON Schema.
1470
1471 - `Optional<Type> type`
1472
1473 The type of the tool, i.e. `function`.
1474
1475 - `FUNCTION("function")`
1476
1477 - `class McpTool:`
1478
1479 Give the model access to additional tools via remote Model Context Protocol
1480 (MCP) servers. [Learn more about MCP](https://platform.openai.com/docs/guides/tools-remote-mcp).
1481
1482 - `String serverLabel`
1483
1484 A label for this MCP server, used to identify it in tool calls.
1485
1486 - `JsonValue; type "mcp"constant`
1487
1488 The type of the MCP tool. Always `mcp`.
1489
1490 - `MCP("mcp")`
1491
1492 - `Optional<AllowedTools> allowedTools`
1493
1494 List of allowed tool names or a filter object.
1495
1496 - `List<String>`
1497
1498 - `class McpToolFilter:`
1499
1500 A filter object to specify which tools are allowed.
1501
1502 - `Optional<Boolean> readOnly`
1503
1504 Indicates whether or not a tool modifies data or is read-only. If an
1505 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),
1506 it will match this filter.
1507
1508 - `Optional<List<String>> toolNames`
1509
1510 List of allowed tool names.
1511
1512 - `Optional<String> authorization`
1513
1514 An OAuth access token that can be used with a remote MCP server, either
1515 with a custom MCP server URL or a service connector. Your application
1516 must handle the OAuth authorization flow and provide the token here.
1517
1518 - `Optional<ConnectorId> connectorId`
1519
1520 Identifier for service connectors, like those available in ChatGPT. One of
1521 `server_url` or `connector_id` must be provided. Learn more about service
1522 connectors [here](https://platform.openai.com/docs/guides/tools-remote-mcp#connectors).
1523
1524 Currently supported `connector_id` values are:
1525
1526 - Dropbox: `connector_dropbox`
1527 - Gmail: `connector_gmail`
1528 - Google Calendar: `connector_googlecalendar`
1529 - Google Drive: `connector_googledrive`
1530 - Microsoft Teams: `connector_microsoftteams`
1531 - Outlook Calendar: `connector_outlookcalendar`
1532 - Outlook Email: `connector_outlookemail`
1533 - SharePoint: `connector_sharepoint`
1534
1535 - `CONNECTOR_DROPBOX("connector_dropbox")`
1536
1537 - `CONNECTOR_GMAIL("connector_gmail")`
1538
1539 - `CONNECTOR_GOOGLECALENDAR("connector_googlecalendar")`
1540
1541 - `CONNECTOR_GOOGLEDRIVE("connector_googledrive")`
1542
1543 - `CONNECTOR_MICROSOFTTEAMS("connector_microsoftteams")`
1544
1545 - `CONNECTOR_OUTLOOKCALENDAR("connector_outlookcalendar")`
1546
1547 - `CONNECTOR_OUTLOOKEMAIL("connector_outlookemail")`
1548
1549 - `CONNECTOR_SHAREPOINT("connector_sharepoint")`
1550
1551 - `Optional<Boolean> deferLoading`
1552
1553 Whether this MCP tool is deferred and discovered via tool search.
1554
1555 - `Optional<Headers> headers`
1556
1557 Optional HTTP headers to send to the MCP server. Use for authentication
1558 or other purposes.
1559
1560 - `Optional<RequireApproval> requireApproval`
1561
1562 Specify which of the MCP server's tools require approval.
1563
1564 - `class McpToolApprovalFilter:`
1565
1566 Specify which of the MCP server's tools require approval. Can be
1567 `always`, `never`, or a filter object associated with tools
1568 that require approval.
1569
1570 - `Optional<Always> always`
1571
1572 A filter object to specify which tools are allowed.
1573
1574 - `Optional<Boolean> readOnly`
1575
1576 Indicates whether or not a tool modifies data or is read-only. If an
1577 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),
1578 it will match this filter.
1579
1580 - `Optional<List<String>> toolNames`
1581
1582 List of allowed tool names.
1583
1584 - `Optional<Never> never`
1585
1586 A filter object to specify which tools are allowed.
1587
1588 - `Optional<Boolean> readOnly`
1589
1590 Indicates whether or not a tool modifies data or is read-only. If an
1591 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),
1592 it will match this filter.
1593
1594 - `Optional<List<String>> toolNames`
1595
1596 List of allowed tool names.
1597
1598 - `enum McpToolApprovalSetting:`
1599
1600 Specify a single approval policy for all tools. One of `always` or
1601 `never`. When set to `always`, all tools will require approval. When
1602 set to `never`, all tools will not require approval.
1603
1604 - `ALWAYS("always")`
1605
1606 - `NEVER("never")`
1607
1608 - `Optional<String> serverDescription`
1609
1610 Optional description of the MCP server, used to provide more context.
1611
1612 - `Optional<String> serverUrl`
1613
1614 The URL for the MCP server. One of `server_url` or `connector_id` must be
1615 provided.
1616
1617 - `Optional<Tracing> tracing`
1618
1619 Realtime API can write session traces to the [Traces Dashboard](https://platform.openai.com/logs?api=traces). Set to null to disable tracing. Once
1620 tracing is enabled for a session, the configuration cannot be modified.
1621
1622 `auto` will create a trace for the session with default values for the
1623 workflow name, group id, and metadata.
1624
1625 - `JsonValue;`
1626
1627 - `AUTO("auto")`
1628
1629 - `class TracingConfiguration:`
1630
1631 Granular configuration for tracing.
1632
1633 - `Optional<String> groupId`
1634
1635 The group id to attach to this trace to enable filtering and
1636 grouping in the Traces Dashboard.
1637
1638 - `Optional<JsonValue> metadata`
1639
1640 The arbitrary metadata to attach to this trace to enable
1641 filtering in the Traces Dashboard.
1642
1643 - `Optional<String> workflowName`
1644
1645 The name of the workflow to attach to this trace. This is used to
1646 name the trace in the Traces Dashboard.
1647
1648 - `Optional<RealtimeTruncation> truncation`
1649
1650 When the number of tokens in a conversation exceeds the model's input token limit, the conversation be truncated, meaning messages (starting from the oldest) will not be included in the model's context. A 32k context model with 4,096 max output tokens can only include 28,224 tokens in the context before truncation occurs.
1651
1652 Clients can configure truncation behavior to truncate with a lower max token limit, which is an effective way to control token usage and cost.
1653
1654 Truncation will reduce the number of cached tokens on the next turn (busting the cache), since messages are dropped from the beginning of the context. However, clients can also configure truncation to retain messages up to a fraction of the maximum context size, which will reduce the need for future truncations and thus improve the cache rate.
1655
1656 Truncation can be disabled entirely, which means the server will never truncate but would instead return an error if the conversation exceeds the model's input token limit.
1657
1658 - `RealtimeTruncationStrategy`
1659
1660 - `AUTO("auto")`
1661
1662 - `DISABLED("disabled")`
1663
1664 - `class RealtimeTruncationRetentionRatio:`
1665
1666 Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.
1667
1668 - `double retentionRatio`
1669
1670 Fraction of post-instruction conversation tokens to retain (`0.0` - `1.0`) when the conversation exceeds the input token limit. Setting this to `0.8` means that messages will be dropped until 80% of the maximum allowed tokens are used. This helps reduce the frequency of truncations and improve cache rates.
1671
1672 - `JsonValue; type "retention_ratio"constant`
1673
1674 Use retention ratio truncation.
1675
1676 - `RETENTION_RATIO("retention_ratio")`
1677
1678 - `Optional<TokenLimits> tokenLimits`
1679
1680 Optional custom token limits for this truncation strategy. If not provided, the model's default token limits will be used.
1681
1682 - `Optional<Long> postInstructions`
1683
1684 Maximum tokens allowed in the conversation after instructions (which including tool definitions). For example, setting this to 5,000 would mean that truncation would occur when the conversation exceeds 5,000 tokens after instructions. This cannot be higher than the model's context window size minus the maximum output tokens.
1685
1686 - `class RealtimeTranscriptionSessionCreateResponse:`
1687
1688 A Realtime transcription session configuration object.
1689
1690 - `String id`
1691
1692 Unique identifier for the session that looks like `sess_1234567890abcdef`.
1693
1694 - `String object_`
1695
1696 The object type. Always `realtime.transcription_session`.
1697
1698 - `JsonValue; type "transcription"constant`
1699
1700 The type of session. Always `transcription` for transcription sessions.
1701
1702 - `TRANSCRIPTION("transcription")`
1703
1704 - `Optional<Audio> audio`
1705
1706 Configuration for input audio for the session.
1707
1708 - `Optional<Input> input`
1709
1710 - `Optional<RealtimeAudioFormats> format`
1711
1712 The PCM audio format. Only a 24kHz sample rate is supported.
1713
1714 - `Optional<NoiseReduction> noiseReduction`
1715
1716 Configuration for input audio noise reduction.
1717
1718 - `Optional<NoiseReductionType> type`
1719
1720 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.
1721
1722 - `Optional<AudioTranscription> transcription`
1723
1724 - `Optional<RealtimeTranscriptionSessionTurnDetection> turnDetection`
1725
1726 Configuration for turn detection. Can be set to `null` to turn off. Server
1727 VAD means that the model will detect the start and end of speech based on
1728 audio volume and respond at the end of user speech. For `gpt-realtime-whisper`, this must be `null`; VAD is not supported.
1729
1730 - `Optional<Long> prefixPaddingMs`
1731
1732 Amount of audio to include before the VAD detected speech (in
1733 milliseconds). Defaults to 300ms.
1734
1735 - `Optional<Long> silenceDurationMs`
1736
1737 Duration of silence to detect speech stop (in milliseconds). Defaults
1738 to 500ms. With shorter values the model will respond more quickly,
1739 but may jump in on short pauses from the user.
1740
1741 - `Optional<Double> threshold`
1742
1743 Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A
1744 higher threshold will require louder audio to activate the model, and
1745 thus might perform better in noisy environments.
1746
1747 - `Optional<String> type`
1748
1749 Type of turn detection, only `server_vad` is currently supported.
1750
1751 - `Optional<Long> expiresAt`
1752
1753 Expiration timestamp for the session, in seconds since epoch.
1754
1755 - `Optional<List<Include>> include`
1756
1757 Additional fields to include in server outputs.
1758
1759 - `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.
1760
1761 - `ITEM_INPUT_AUDIO_TRANSCRIPTION_LOGPROBS("item.input_audio_transcription.logprobs")`
1762
1763 - `String value`
1764
1765 The generated client secret value.
1766
1767### Example
1768
1769```java
1770package com.openai.example;
1771
1772import com.openai.client.OpenAIClient;
1773import com.openai.client.okhttp.OpenAIOkHttpClient;
1774import com.openai.models.realtime.clientsecrets.ClientSecretCreateParams;
1775import com.openai.models.realtime.clientsecrets.ClientSecretCreateResponse;
1776
1777public final class Main {
1778 private Main() {}
1779
1780 public static void main(String[] args) {
1781 OpenAIClient client = OpenAIOkHttpClient.fromEnv();
1782
1783 ClientSecretCreateResponse clientSecret = client.realtime().clientSecrets().create();
1784 }
1785}
1786```
1787
1788#### Response
1789
1790```json
1791{
1792 "expires_at": 0,
1793 "session": {
1794 "id": "id",
1795 "object": "realtime.session",
1796 "type": "realtime",
1797 "audio": {
1798 "input": {
1799 "format": {
1800 "rate": 24000,
1801 "type": "audio/pcm"
1802 },
1803 "noise_reduction": {
1804 "type": "near_field"
1805 },
1806 "transcription": {
1807 "delay": "minimal",
1808 "language": "language",
1809 "model": "string",
1810 "prompt": "prompt"
1811 },
1812 "turn_detection": {
1813 "type": "server_vad",
1814 "create_response": true,
1815 "idle_timeout_ms": 5000,
1816 "interrupt_response": true,
1817 "prefix_padding_ms": 0,
1818 "silence_duration_ms": 0,
1819 "threshold": 0
1820 }
1821 },
1822 "output": {
1823 "format": {
1824 "rate": 24000,
1825 "type": "audio/pcm"
1826 },
1827 "speed": 0.25,
1828 "voice": "ash"
1829 }
1830 },
1831 "expires_at": 0,
1832 "include": [
1833 "item.input_audio_transcription.logprobs"
1834 ],
1835 "instructions": "instructions",
1836 "max_output_tokens": 0,
1837 "model": "string",
1838 "output_modalities": [
1839 "text"
1840 ],
1841 "prompt": {
1842 "id": "id",
1843 "variables": {
1844 "foo": "string"
1845 },
1846 "version": "version"
1847 },
1848 "reasoning": {
1849 "effort": "minimal"
1850 },
1851 "tool_choice": "none",
1852 "tools": [
1853 {
1854 "description": "description",
1855 "name": "name",
1856 "parameters": {},
1857 "type": "function"
1858 }
1859 ],
1860 "tracing": "auto",
1861 "truncation": "auto"
1862 },
1863 "value": "value"
1864}
1865```
1866
1867## Domain Types
1868
1869### Realtime Session Create Response
1870
1871- `class RealtimeSessionCreateResponse:`
1872
1873 A Realtime session configuration object.
1874
1875 - `String id`
1876
1877 Unique identifier for the session that looks like `sess_1234567890abcdef`.
1878
1879 - `JsonValue; object_ "realtime.session"constant`
1880
1881 The object type. Always `realtime.session`.
1882
1883 - `REALTIME_SESSION("realtime.session")`
1884
1885 - `JsonValue; type "realtime"constant`
1886
1887 The type of session to create. Always `realtime` for the Realtime API.
1888
1889 - `REALTIME("realtime")`
1890
1891 - `Optional<Audio> audio`
1892
1893 Configuration for input and output audio.
1894
1895 - `Optional<Input> input`
1896
1897 - `Optional<RealtimeAudioFormats> format`
1898
1899 The format of the input audio.
1900
1901 - `AudioPcm`
1902
1903 - `Optional<Rate> rate`
1904
1905 The sample rate of the audio. Always `24000`.
1906
1907 - `_24000(24000)`
1908
1909 - `Optional<Type> type`
1910
1911 The audio format. Always `audio/pcm`.
1912
1913 - `AUDIO_PCM("audio/pcm")`
1914
1915 - `AudioPcmu`
1916
1917 - `Optional<Type> type`
1918
1919 The audio format. Always `audio/pcmu`.
1920
1921 - `AUDIO_PCMU("audio/pcmu")`
1922
1923 - `AudioPcma`
1924
1925 - `Optional<Type> type`
1926
1927 The audio format. Always `audio/pcma`.
1928
1929 - `AUDIO_PCMA("audio/pcma")`
1930
1931 - `Optional<NoiseReduction> noiseReduction`
1932
1933 Configuration for input audio noise reduction. This can be set to `null` to turn off.
1934 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.
1935 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.
1936
1937 - `Optional<NoiseReductionType> type`
1938
1939 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.
1940
1941 - `NEAR_FIELD("near_field")`
1942
1943 - `FAR_FIELD("far_field")`
1944
1945 - `Optional<AudioTranscription> transcription`
1946
1947 - `Optional<Delay> delay`
1948
1949 Controls how long the model waits before emitting transcription text.
1950 Higher values can improve transcription accuracy at the cost of latency.
1951 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.
1952
1953 - `MINIMAL("minimal")`
1954
1955 - `LOW("low")`
1956
1957 - `MEDIUM("medium")`
1958
1959 - `HIGH("high")`
1960
1961 - `XHIGH("xhigh")`
1962
1963 - `Optional<String> language`
1964
1965 The language of the input audio. Supplying the input language in
1966 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format
1967 will improve accuracy and latency.
1968
1969 - `Optional<Model> model`
1970
1971 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.
1972
1973 - `WHISPER_1("whisper-1")`
1974
1975 - `GPT_4O_MINI_TRANSCRIBE("gpt-4o-mini-transcribe")`
1976
1977 - `GPT_4O_MINI_TRANSCRIBE_2025_12_15("gpt-4o-mini-transcribe-2025-12-15")`
1978
1979 - `GPT_4O_TRANSCRIBE("gpt-4o-transcribe")`
1980
1981 - `GPT_4O_TRANSCRIBE_DIARIZE("gpt-4o-transcribe-diarize")`
1982
1983 - `GPT_REALTIME_WHISPER("gpt-realtime-whisper")`
1984
1985 - `Optional<String> prompt`
1986
1987 An optional text to guide the model's style or continue a previous audio
1988 segment.
1989 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).
1990 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".
1991 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.
1992
1993 - `Optional<TurnDetection> turnDetection`
1994
1995 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.
1996
1997 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.
1998
1999 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.
2000
2001 For `gpt-realtime-whisper` transcription sessions, turn detection must be
2002 set to `null`; VAD is not supported.
2003
2004 - `class ServerVad:`
2005
2006 Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.
2007
2008 - `JsonValue; type "server_vad"constant`
2009
2010 Type of turn detection, `server_vad` to turn on simple Server VAD.
2011
2012 - `SERVER_VAD("server_vad")`
2013
2014 - `Optional<Boolean> createResponse`
2015
2016 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.
2017
2018 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.
2019
2020 - `Optional<Long> idleTimeoutMs`
2021
2022 Optional timeout after which a model response will be triggered automatically. This is
2023 useful for situations in which a long pause from the user is unexpected, such as a phone
2024 call. The model will effectively prompt the user to continue the conversation based
2025 on the current context.
2026
2027 The timeout value will be applied after the last model response's audio has finished playing,
2028 i.e. it's set to the `response.done` time plus audio playback duration.
2029
2030 An `input_audio_buffer.timeout_triggered` event (plus events
2031 associated with the Response) will be emitted when the timeout is reached.
2032 Idle timeout is currently only supported for `server_vad` mode.
2033
2034 - `Optional<Boolean> interruptResponse`
2035
2036 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default
2037 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.
2038
2039 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.
2040
2041 - `Optional<Long> prefixPaddingMs`
2042
2043 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in
2044 milliseconds). Defaults to 300ms.
2045
2046 - `Optional<Long> silenceDurationMs`
2047
2048 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults
2049 to 500ms. With shorter values the model will respond more quickly,
2050 but may jump in on short pauses from the user.
2051
2052 - `Optional<Double> threshold`
2053
2054 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A
2055 higher threshold will require louder audio to activate the model, and
2056 thus might perform better in noisy environments.
2057
2058 - `class SemanticVad:`
2059
2060 Server-side semantic turn detection which uses a model to determine when the user has finished speaking.
2061
2062 - `JsonValue; type "semantic_vad"constant`
2063
2064 Type of turn detection, `semantic_vad` to turn on Semantic VAD.
2065
2066 - `SEMANTIC_VAD("semantic_vad")`
2067
2068 - `Optional<Boolean> createResponse`
2069
2070 Whether or not to automatically generate a response when a VAD stop event occurs.
2071
2072 - `Optional<Eagerness> eagerness`
2073
2074 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.
2075
2076 - `LOW("low")`
2077
2078 - `MEDIUM("medium")`
2079
2080 - `HIGH("high")`
2081
2082 - `AUTO("auto")`
2083
2084 - `Optional<Boolean> interruptResponse`
2085
2086 Whether or not to automatically interrupt any ongoing response with output to the default
2087 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.
2088
2089 - `Optional<Output> output`
2090
2091 - `Optional<RealtimeAudioFormats> format`
2092
2093 The format of the output audio.
2094
2095 - `Optional<Double> speed`
2096
2097 The speed of the model's spoken response as a multiple of the original speed.
2098 1.0 is the default speed. 0.25 is the minimum speed. 1.5 is the maximum speed. This value can only be changed in between model turns, not while a response is in progress.
2099
2100 This parameter is a post-processing adjustment to the audio after it is generated, it's
2101 also possible to prompt the model to speak faster or slower.
2102
2103 - `Optional<Voice> voice`
2104
2105 The voice the model uses to respond. Voice cannot be changed during the
2106 session once the model has responded with audio at least once. Current
2107 voice options are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`,
2108 `shimmer`, `verse`, `marin`, and `cedar`. We recommend `marin` and `cedar` for
2109 best quality.
2110
2111 - `ALLOY("alloy")`
2112
2113 - `ASH("ash")`
2114
2115 - `BALLAD("ballad")`
2116
2117 - `CORAL("coral")`
2118
2119 - `ECHO("echo")`
2120
2121 - `SAGE("sage")`
2122
2123 - `SHIMMER("shimmer")`
2124
2125 - `VERSE("verse")`
2126
2127 - `MARIN("marin")`
2128
2129 - `CEDAR("cedar")`
2130
2131 - `Optional<Long> expiresAt`
2132
2133 Expiration timestamp for the session, in seconds since epoch.
2134
2135 - `Optional<List<Include>> include`
2136
2137 Additional fields to include in server outputs.
2138
2139 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.
2140
2141 - `ITEM_INPUT_AUDIO_TRANSCRIPTION_LOGPROBS("item.input_audio_transcription.logprobs")`
2142
2143 - `Optional<String> instructions`
2144
2145 The default system instructions (i.e. system message) prepended to model calls. This field allows the client to guide the model on desired responses. The model can be instructed on response content and format, (e.g. "be extremely succinct", "act friendly", "here are examples of good responses") and on audio behavior (e.g. "talk quickly", "inject emotion into your voice", "laugh frequently"). The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behavior.
2146
2147 Note that the server sets default instructions which will be used if this field is not set and are visible in the `session.created` event at the start of the session.
2148
2149 - `Optional<MaxOutputTokens> maxOutputTokens`
2150
2151 Maximum number of output tokens for a single assistant response,
2152 inclusive of tool calls. Provide an integer between 1 and 4096 to
2153 limit output tokens, or `inf` for the maximum available tokens for a
2154 given model. Defaults to `inf`.
2155
2156 - `long`
2157
2158 - `JsonValue;`
2159
2160 - `INF("inf")`
2161
2162 - `Optional<Model> model`
2163
2164 The Realtime model used for this session.
2165
2166 - `GPT_REALTIME("gpt-realtime")`
2167
2168 - `GPT_REALTIME_1_5("gpt-realtime-1.5")`
2169
2170 - `GPT_REALTIME_2("gpt-realtime-2")`
2171
2172 - `GPT_REALTIME_2025_08_28("gpt-realtime-2025-08-28")`
2173
2174 - `GPT_4O_REALTIME_PREVIEW("gpt-4o-realtime-preview")`
2175
2176 - `GPT_4O_REALTIME_PREVIEW_2024_10_01("gpt-4o-realtime-preview-2024-10-01")`
2177
2178 - `GPT_4O_REALTIME_PREVIEW_2024_12_17("gpt-4o-realtime-preview-2024-12-17")`
2179
2180 - `GPT_4O_REALTIME_PREVIEW_2025_06_03("gpt-4o-realtime-preview-2025-06-03")`
2181
2182 - `GPT_4O_MINI_REALTIME_PREVIEW("gpt-4o-mini-realtime-preview")`
2183
2184 - `GPT_4O_MINI_REALTIME_PREVIEW_2024_12_17("gpt-4o-mini-realtime-preview-2024-12-17")`
2185
2186 - `GPT_REALTIME_MINI("gpt-realtime-mini")`
2187
2188 - `GPT_REALTIME_MINI_2025_10_06("gpt-realtime-mini-2025-10-06")`
2189
2190 - `GPT_REALTIME_MINI_2025_12_15("gpt-realtime-mini-2025-12-15")`
2191
2192 - `GPT_AUDIO_1_5("gpt-audio-1.5")`
2193
2194 - `GPT_AUDIO_MINI("gpt-audio-mini")`
2195
2196 - `GPT_AUDIO_MINI_2025_10_06("gpt-audio-mini-2025-10-06")`
2197
2198 - `GPT_AUDIO_MINI_2025_12_15("gpt-audio-mini-2025-12-15")`
2199
2200 - `Optional<List<OutputModality>> outputModalities`
2201
2202 The set of modalities the model can respond with. It defaults to `["audio"]`, indicating
2203 that the model will respond with audio plus a transcript. `["text"]` can be used to make
2204 the model respond with text only. It is not possible to request both `text` and `audio` at the same time.
2205
2206 - `TEXT("text")`
2207
2208 - `AUDIO("audio")`
2209
2210 - `Optional<ResponsePrompt> prompt`
2211
2212 Reference to a prompt template and its variables.
2213 [Learn more](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts).
2214
2215 - `String id`
2216
2217 The unique identifier of the prompt template to use.
2218
2219 - `Optional<Variables> variables`
2220
2221 Optional map of values to substitute in for variables in your
2222 prompt. The substitution values can either be strings, or other
2223 Response input types like images or files.
2224
2225 - `String`
2226
2227 - `class ResponseInputText:`
2228
2229 A text input to the model.
2230
2231 - `String text`
2232
2233 The text input to the model.
2234
2235 - `JsonValue; type "input_text"constant`
2236
2237 The type of the input item. Always `input_text`.
2238
2239 - `INPUT_TEXT("input_text")`
2240
2241 - `class ResponseInputImage:`
2242
2243 An image input to the model. Learn about [image inputs](https://platform.openai.com/docs/guides/vision).
2244
2245 - `Detail detail`
2246
2247 The detail level of the image to be sent to the model. One of `high`, `low`, `auto`, or `original`. Defaults to `auto`.
2248
2249 - `LOW("low")`
2250
2251 - `HIGH("high")`
2252
2253 - `AUTO("auto")`
2254
2255 - `ORIGINAL("original")`
2256
2257 - `JsonValue; type "input_image"constant`
2258
2259 The type of the input item. Always `input_image`.
2260
2261 - `INPUT_IMAGE("input_image")`
2262
2263 - `Optional<String> fileId`
2264
2265 The ID of the file to be sent to the model.
2266
2267 - `Optional<String> imageUrl`
2268
2269 The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.
2270
2271 - `class ResponseInputFile:`
2272
2273 A file input to the model.
2274
2275 - `JsonValue; type "input_file"constant`
2276
2277 The type of the input item. Always `input_file`.
2278
2279 - `INPUT_FILE("input_file")`
2280
2281 - `Optional<Detail> detail`
2282
2283 The detail level of the file to be sent to the model. Use `low` for the default rendering behavior, or `high` to render the file at higher quality. Defaults to `low`.
2284
2285 - `LOW("low")`
2286
2287 - `HIGH("high")`
2288
2289 - `Optional<String> fileData`
2290
2291 The content of the file to be sent to the model.
2292
2293 - `Optional<String> fileId`
2294
2295 The ID of the file to be sent to the model.
2296
2297 - `Optional<String> fileUrl`
2298
2299 The URL of the file to be sent to the model.
2300
2301 - `Optional<String> filename`
2302
2303 The name of the file to be sent to the model.
2304
2305 - `Optional<String> version`
2306
2307 Optional version of the prompt template.
2308
2309 - `Optional<RealtimeReasoning> reasoning`
2310
2311 Configuration for reasoning-capable Realtime models such as `gpt-realtime-2`.
2312
2313 - `Optional<RealtimeReasoningEffort> effort`
2314
2315 Constrains effort on reasoning for reasoning-capable Realtime models such as
2316 `gpt-realtime-2`.
2317
2318 - `MINIMAL("minimal")`
2319
2320 - `LOW("low")`
2321
2322 - `MEDIUM("medium")`
2323
2324 - `HIGH("high")`
2325
2326 - `XHIGH("xhigh")`
2327
2328 - `Optional<ToolChoice> toolChoice`
2329
2330 How the model chooses tools. Provide one of the string modes or force a specific
2331 function/MCP tool.
2332
2333 - `enum ToolChoiceOptions:`
2334
2335 Controls which (if any) tool is called by the model.
2336
2337 `none` means the model will not call any tool and instead generates a message.
2338
2339 `auto` means the model can pick between generating a message or calling one or
2340 more tools.
2341
2342 `required` means the model must call one or more tools.
2343
2344 - `NONE("none")`
2345
2346 - `AUTO("auto")`
2347
2348 - `REQUIRED("required")`
2349
2350 - `class ToolChoiceFunction:`
2351
2352 Use this option to force the model to call a specific function.
2353
2354 - `String name`
2355
2356 The name of the function to call.
2357
2358 - `JsonValue; type "function"constant`
2359
2360 For function calling, the type is always `function`.
2361
2362 - `FUNCTION("function")`
2363
2364 - `class ToolChoiceMcp:`
2365
2366 Use this option to force the model to call a specific tool on a remote MCP server.
2367
2368 - `String serverLabel`
2369
2370 The label of the MCP server to use.
2371
2372 - `JsonValue; type "mcp"constant`
2373
2374 For MCP tools, the type is always `mcp`.
2375
2376 - `MCP("mcp")`
2377
2378 - `Optional<String> name`
2379
2380 The name of the tool to call on the server.
2381
2382 - `Optional<List<Tool>> tools`
2383
2384 Tools available to the model.
2385
2386 - `class RealtimeFunctionTool:`
2387
2388 - `Optional<String> description`
2389
2390 The description of the function, including guidance on when and how
2391 to call it, and guidance about what to tell the user when calling
2392 (if anything).
2393
2394 - `Optional<String> name`
2395
2396 The name of the function.
2397
2398 - `Optional<JsonValue> parameters`
2399
2400 Parameters of the function in JSON Schema.
2401
2402 - `Optional<Type> type`
2403
2404 The type of the tool, i.e. `function`.
2405
2406 - `FUNCTION("function")`
2407
2408 - `class McpTool:`
2409
2410 Give the model access to additional tools via remote Model Context Protocol
2411 (MCP) servers. [Learn more about MCP](https://platform.openai.com/docs/guides/tools-remote-mcp).
2412
2413 - `String serverLabel`
2414
2415 A label for this MCP server, used to identify it in tool calls.
2416
2417 - `JsonValue; type "mcp"constant`
2418
2419 The type of the MCP tool. Always `mcp`.
2420
2421 - `MCP("mcp")`
2422
2423 - `Optional<AllowedTools> allowedTools`
2424
2425 List of allowed tool names or a filter object.
2426
2427 - `List<String>`
2428
2429 - `class McpToolFilter:`
2430
2431 A filter object to specify which tools are allowed.
2432
2433 - `Optional<Boolean> readOnly`
2434
2435 Indicates whether or not a tool modifies data or is read-only. If an
2436 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),
2437 it will match this filter.
2438
2439 - `Optional<List<String>> toolNames`
2440
2441 List of allowed tool names.
2442
2443 - `Optional<String> authorization`
2444
2445 An OAuth access token that can be used with a remote MCP server, either
2446 with a custom MCP server URL or a service connector. Your application
2447 must handle the OAuth authorization flow and provide the token here.
2448
2449 - `Optional<ConnectorId> connectorId`
2450
2451 Identifier for service connectors, like those available in ChatGPT. One of
2452 `server_url` or `connector_id` must be provided. Learn more about service
2453 connectors [here](https://platform.openai.com/docs/guides/tools-remote-mcp#connectors).
2454
2455 Currently supported `connector_id` values are:
2456
2457 - Dropbox: `connector_dropbox`
2458 - Gmail: `connector_gmail`
2459 - Google Calendar: `connector_googlecalendar`
2460 - Google Drive: `connector_googledrive`
2461 - Microsoft Teams: `connector_microsoftteams`
2462 - Outlook Calendar: `connector_outlookcalendar`
2463 - Outlook Email: `connector_outlookemail`
2464 - SharePoint: `connector_sharepoint`
2465
2466 - `CONNECTOR_DROPBOX("connector_dropbox")`
2467
2468 - `CONNECTOR_GMAIL("connector_gmail")`
2469
2470 - `CONNECTOR_GOOGLECALENDAR("connector_googlecalendar")`
2471
2472 - `CONNECTOR_GOOGLEDRIVE("connector_googledrive")`
2473
2474 - `CONNECTOR_MICROSOFTTEAMS("connector_microsoftteams")`
2475
2476 - `CONNECTOR_OUTLOOKCALENDAR("connector_outlookcalendar")`
2477
2478 - `CONNECTOR_OUTLOOKEMAIL("connector_outlookemail")`
2479
2480 - `CONNECTOR_SHAREPOINT("connector_sharepoint")`
2481
2482 - `Optional<Boolean> deferLoading`
2483
2484 Whether this MCP tool is deferred and discovered via tool search.
2485
2486 - `Optional<Headers> headers`
2487
2488 Optional HTTP headers to send to the MCP server. Use for authentication
2489 or other purposes.
2490
2491 - `Optional<RequireApproval> requireApproval`
2492
2493 Specify which of the MCP server's tools require approval.
2494
2495 - `class McpToolApprovalFilter:`
2496
2497 Specify which of the MCP server's tools require approval. Can be
2498 `always`, `never`, or a filter object associated with tools
2499 that require approval.
2500
2501 - `Optional<Always> always`
2502
2503 A filter object to specify which tools are allowed.
2504
2505 - `Optional<Boolean> readOnly`
2506
2507 Indicates whether or not a tool modifies data or is read-only. If an
2508 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),
2509 it will match this filter.
2510
2511 - `Optional<List<String>> toolNames`
2512
2513 List of allowed tool names.
2514
2515 - `Optional<Never> never`
2516
2517 A filter object to specify which tools are allowed.
2518
2519 - `Optional<Boolean> readOnly`
2520
2521 Indicates whether or not a tool modifies data or is read-only. If an
2522 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),
2523 it will match this filter.
2524
2525 - `Optional<List<String>> toolNames`
2526
2527 List of allowed tool names.
2528
2529 - `enum McpToolApprovalSetting:`
2530
2531 Specify a single approval policy for all tools. One of `always` or
2532 `never`. When set to `always`, all tools will require approval. When
2533 set to `never`, all tools will not require approval.
2534
2535 - `ALWAYS("always")`
2536
2537 - `NEVER("never")`
2538
2539 - `Optional<String> serverDescription`
2540
2541 Optional description of the MCP server, used to provide more context.
2542
2543 - `Optional<String> serverUrl`
2544
2545 The URL for the MCP server. One of `server_url` or `connector_id` must be
2546 provided.
2547
2548 - `Optional<Tracing> tracing`
2549
2550 Realtime API can write session traces to the [Traces Dashboard](https://platform.openai.com/logs?api=traces). Set to null to disable tracing. Once
2551 tracing is enabled for a session, the configuration cannot be modified.
2552
2553 `auto` will create a trace for the session with default values for the
2554 workflow name, group id, and metadata.
2555
2556 - `JsonValue;`
2557
2558 - `AUTO("auto")`
2559
2560 - `class TracingConfiguration:`
2561
2562 Granular configuration for tracing.
2563
2564 - `Optional<String> groupId`
2565
2566 The group id to attach to this trace to enable filtering and
2567 grouping in the Traces Dashboard.
2568
2569 - `Optional<JsonValue> metadata`
2570
2571 The arbitrary metadata to attach to this trace to enable
2572 filtering in the Traces Dashboard.
2573
2574 - `Optional<String> workflowName`
2575
2576 The name of the workflow to attach to this trace. This is used to
2577 name the trace in the Traces Dashboard.
2578
2579 - `Optional<RealtimeTruncation> truncation`
2580
2581 When the number of tokens in a conversation exceeds the model's input token limit, the conversation be truncated, meaning messages (starting from the oldest) will not be included in the model's context. A 32k context model with 4,096 max output tokens can only include 28,224 tokens in the context before truncation occurs.
2582
2583 Clients can configure truncation behavior to truncate with a lower max token limit, which is an effective way to control token usage and cost.
2584
2585 Truncation will reduce the number of cached tokens on the next turn (busting the cache), since messages are dropped from the beginning of the context. However, clients can also configure truncation to retain messages up to a fraction of the maximum context size, which will reduce the need for future truncations and thus improve the cache rate.
2586
2587 Truncation can be disabled entirely, which means the server will never truncate but would instead return an error if the conversation exceeds the model's input token limit.
2588
2589 - `RealtimeTruncationStrategy`
2590
2591 - `AUTO("auto")`
2592
2593 - `DISABLED("disabled")`
2594
2595 - `class RealtimeTruncationRetentionRatio:`
2596
2597 Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.
2598
2599 - `double retentionRatio`
2600
2601 Fraction of post-instruction conversation tokens to retain (`0.0` - `1.0`) when the conversation exceeds the input token limit. Setting this to `0.8` means that messages will be dropped until 80% of the maximum allowed tokens are used. This helps reduce the frequency of truncations and improve cache rates.
2602
2603 - `JsonValue; type "retention_ratio"constant`
2604
2605 Use retention ratio truncation.
2606
2607 - `RETENTION_RATIO("retention_ratio")`
2608
2609 - `Optional<TokenLimits> tokenLimits`
2610
2611 Optional custom token limits for this truncation strategy. If not provided, the model's default token limits will be used.
2612
2613 - `Optional<Long> postInstructions`
2614
2615 Maximum tokens allowed in the conversation after instructions (which including tool definitions). For example, setting this to 5,000 would mean that truncation would occur when the conversation exceeds 5,000 tokens after instructions. This cannot be higher than the model's context window size minus the maximum output tokens.
2616
2617### Realtime Transcription Session Create Response
2618
2619- `class RealtimeTranscriptionSessionCreateResponse:`
2620
2621 A Realtime transcription session configuration object.
2622
2623 - `String id`
2624
2625 Unique identifier for the session that looks like `sess_1234567890abcdef`.
2626
2627 - `String object_`
2628
2629 The object type. Always `realtime.transcription_session`.
2630
2631 - `JsonValue; type "transcription"constant`
2632
2633 The type of session. Always `transcription` for transcription sessions.
2634
2635 - `TRANSCRIPTION("transcription")`
2636
2637 - `Optional<Audio> audio`
2638
2639 Configuration for input audio for the session.
2640
2641 - `Optional<Input> input`
2642
2643 - `Optional<RealtimeAudioFormats> format`
2644
2645 The PCM audio format. Only a 24kHz sample rate is supported.
2646
2647 - `AudioPcm`
2648
2649 - `Optional<Rate> rate`
2650
2651 The sample rate of the audio. Always `24000`.
2652
2653 - `_24000(24000)`
2654
2655 - `Optional<Type> type`
2656
2657 The audio format. Always `audio/pcm`.
2658
2659 - `AUDIO_PCM("audio/pcm")`
2660
2661 - `AudioPcmu`
2662
2663 - `Optional<Type> type`
2664
2665 The audio format. Always `audio/pcmu`.
2666
2667 - `AUDIO_PCMU("audio/pcmu")`
2668
2669 - `AudioPcma`
2670
2671 - `Optional<Type> type`
2672
2673 The audio format. Always `audio/pcma`.
2674
2675 - `AUDIO_PCMA("audio/pcma")`
2676
2677 - `Optional<NoiseReduction> noiseReduction`
2678
2679 Configuration for input audio noise reduction.
2680
2681 - `Optional<NoiseReductionType> type`
2682
2683 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.
2684
2685 - `NEAR_FIELD("near_field")`
2686
2687 - `FAR_FIELD("far_field")`
2688
2689 - `Optional<AudioTranscription> transcription`
2690
2691 - `Optional<Delay> delay`
2692
2693 Controls how long the model waits before emitting transcription text.
2694 Higher values can improve transcription accuracy at the cost of latency.
2695 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.
2696
2697 - `MINIMAL("minimal")`
2698
2699 - `LOW("low")`
2700
2701 - `MEDIUM("medium")`
2702
2703 - `HIGH("high")`
2704
2705 - `XHIGH("xhigh")`
2706
2707 - `Optional<String> language`
2708
2709 The language of the input audio. Supplying the input language in
2710 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format
2711 will improve accuracy and latency.
2712
2713 - `Optional<Model> model`
2714
2715 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.
2716
2717 - `WHISPER_1("whisper-1")`
2718
2719 - `GPT_4O_MINI_TRANSCRIBE("gpt-4o-mini-transcribe")`
2720
2721 - `GPT_4O_MINI_TRANSCRIBE_2025_12_15("gpt-4o-mini-transcribe-2025-12-15")`
2722
2723 - `GPT_4O_TRANSCRIBE("gpt-4o-transcribe")`
2724
2725 - `GPT_4O_TRANSCRIBE_DIARIZE("gpt-4o-transcribe-diarize")`
2726
2727 - `GPT_REALTIME_WHISPER("gpt-realtime-whisper")`
2728
2729 - `Optional<String> prompt`
2730
2731 An optional text to guide the model's style or continue a previous audio
2732 segment.
2733 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).
2734 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".
2735 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.
2736
2737 - `Optional<RealtimeTranscriptionSessionTurnDetection> turnDetection`
2738
2739 Configuration for turn detection. Can be set to `null` to turn off. Server
2740 VAD means that the model will detect the start and end of speech based on
2741 audio volume and respond at the end of user speech. For `gpt-realtime-whisper`, this must be `null`; VAD is not supported.
2742
2743 - `Optional<Long> prefixPaddingMs`
2744
2745 Amount of audio to include before the VAD detected speech (in
2746 milliseconds). Defaults to 300ms.
2747
2748 - `Optional<Long> silenceDurationMs`
2749
2750 Duration of silence to detect speech stop (in milliseconds). Defaults
2751 to 500ms. With shorter values the model will respond more quickly,
2752 but may jump in on short pauses from the user.
2753
2754 - `Optional<Double> threshold`
2755
2756 Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A
2757 higher threshold will require louder audio to activate the model, and
2758 thus might perform better in noisy environments.
2759
2760 - `Optional<String> type`
2761
2762 Type of turn detection, only `server_vad` is currently supported.
2763
2764 - `Optional<Long> expiresAt`
2765
2766 Expiration timestamp for the session, in seconds since epoch.
2767
2768 - `Optional<List<Include>> include`
2769
2770 Additional fields to include in server outputs.
2771
2772 - `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.
2773
2774 - `ITEM_INPUT_AUDIO_TRANSCRIPTION_LOGPROBS("item.input_audio_transcription.logprobs")`
2775
2776### Realtime Transcription Session Turn Detection
2777
2778- `class RealtimeTranscriptionSessionTurnDetection:`
2779
2780 Configuration for turn detection. Can be set to `null` to turn off. Server
2781 VAD means that the model will detect the start and end of speech based on
2782 audio volume and respond at the end of user speech. For `gpt-realtime-whisper`, this must be `null`; VAD is not supported.
2783
2784 - `Optional<Long> prefixPaddingMs`
2785
2786 Amount of audio to include before the VAD detected speech (in
2787 milliseconds). Defaults to 300ms.
2788
2789 - `Optional<Long> silenceDurationMs`
2790
2791 Duration of silence to detect speech stop (in milliseconds). Defaults
2792 to 500ms. With shorter values the model will respond more quickly,
2793 but may jump in on short pauses from the user.
2794
2795 - `Optional<Double> threshold`
2796
2797 Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A
2798 higher threshold will require louder audio to activate the model, and
2799 thus might perform better in noisy environments.
2800
2801 - `Optional<String> type`
2802
2803 Type of turn detection, only `server_vad` is currently supported.