cli/resources/realtime/subresources/client_secrets/index.md +0 −1938 deleted
File Deleted View Diff
1# Client Secrets
2
3## Create client secret
4
5`$ openai realtime:client-secrets create`
6
7**post** `/realtime/client_secrets`
8
9Create a Realtime client secret with an associated session configuration.
10
11Client secrets are short-lived tokens that can be passed to a client app,
12such as a web frontend or mobile client, which grants access to the Realtime API without
13leaking your main API key. You can configure a custom TTL for each client secret.
14
15You can also attach session configuration options to the client secret, which will be
16applied to any sessions created using that client secret, but these can also be overridden
17by the client connection.
18
19[Learn more about authentication with client secrets over WebRTC](https://platform.openai.com/docs/guides/realtime-webrtc).
20
21Returns the created client secret and the effective session object. The client secret is a string that looks like `ek_1234`.
22
23### Parameters
24
25- `--expires-after: optional object { anchor, seconds }`
26
27 Configuration for the client secret expiration. Expiration refers to the time after which
28 a client secret will no longer be valid for creating sessions. The session itself may
29 continue after that time once started. A secret can be used to create multiple sessions
30 until it expires.
31
32- `--session: optional RealtimeSessionCreateRequest or RealtimeTranscriptionSessionCreateRequest`
33
34 Session configuration to use for the client secret. Choose either a realtime
35 session or a transcription session.
36
37### Returns
38
39- `ClientSecretNewResponse: object { expires_at, session, value }`
40
41 Response from creating a session and client secret for the Realtime API.
42
43 - `expires_at: number`
44
45 Expiration timestamp for the client secret, in seconds since epoch.
46
47 - `session: RealtimeSessionCreateResponse or RealtimeTranscriptionSessionCreateResponse`
48
49 The session configuration for either a realtime or transcription session.
50
51 - `realtime_session_create_response: object { id, object, type, 13 more }`
52
53 A Realtime session configuration object.
54
55 - `id: string`
56
57 Unique identifier for the session that looks like `sess_1234567890abcdef`.
58
59 - `object: "realtime.session"`
60
61 The object type. Always `realtime.session`.
62
63 - `type: "realtime"`
64
65 The type of session to create. Always `realtime` for the Realtime API.
66
67 - `audio: optional object { input, output }`
68
69 Configuration for input and output audio.
70
71 - `input: optional object { format, noise_reduction, transcription, turn_detection }`
72
73 - `format: optional object { rate, type } or object { type } or object { type }`
74
75 The format of the input audio.
76
77 - `audio/pcm: object { rate, type }`
78
79 The PCM audio format. Only a 24kHz sample rate is supported.
80
81 - `rate: optional 24000`
82
83 The sample rate of the audio. Always `24000`.
84
85 - `24000`
86
87 - `type: optional "audio/pcm"`
88
89 The audio format. Always `audio/pcm`.
90
91 - `"audio/pcm"`
92
93 - `audio/pcmu: object { type }`
94
95 The G.711 μ-law format.
96
97 - `type: optional "audio/pcmu"`
98
99 The audio format. Always `audio/pcmu`.
100
101 - `"audio/pcmu"`
102
103 - `audio/pcma: object { type }`
104
105 The G.711 A-law format.
106
107 - `type: optional "audio/pcma"`
108
109 The audio format. Always `audio/pcma`.
110
111 - `"audio/pcma"`
112
113 - `noise_reduction: optional object { type }`
114
115 Configuration for input audio noise reduction. This can be set to `null` to turn off.
116 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.
117 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.
118
119 - `type: optional "near_field" or "far_field"`
120
121 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.
122
123 - `"near_field"`
124
125 - `"far_field"`
126
127 - `transcription: optional object { delay, language, model, prompt }`
128
129 - `delay: optional "minimal" or "low" or "medium" or 2 more`
130
131 Controls how long the model waits before emitting transcription text.
132 Higher values can improve transcription accuracy at the cost of latency.
133 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.
134
135 - `"minimal"`
136
137 - `"low"`
138
139 - `"medium"`
140
141 - `"high"`
142
143 - `"xhigh"`
144
145 - `language: optional string`
146
147 The language of the input audio. Supplying the input language in
148 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format
149 will improve accuracy and latency.
150
151 - `model: optional string or "whisper-1" or "gpt-4o-mini-transcribe" or "gpt-4o-mini-transcribe-2025-12-15" or 3 more`
152
153 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.
154
155 - `"whisper-1"`
156
157 - `"gpt-4o-mini-transcribe"`
158
159 - `"gpt-4o-mini-transcribe-2025-12-15"`
160
161 - `"gpt-4o-transcribe"`
162
163 - `"gpt-4o-transcribe-diarize"`
164
165 - `"gpt-realtime-whisper"`
166
167 - `prompt: optional string`
168
169 An optional text to guide the model's style or continue a previous audio
170 segment.
171 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).
172 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".
173 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.
174
175 - `turn_detection: optional object { type, create_response, idle_timeout_ms, 4 more } or object { type, create_response, eagerness, interrupt_response }`
176
177 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.
178
179 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.
180
181 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.
182
183 For `gpt-realtime-whisper` transcription sessions, turn detection must be
184 set to `null`; VAD is not supported.
185
186 - `server_vad: object { type, create_response, idle_timeout_ms, 4 more }`
187
188 Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.
189
190 - `type: "server_vad"`
191
192 Type of turn detection, `server_vad` to turn on simple Server VAD.
193
194 - `create_response: optional boolean`
195
196 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.
197
198 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.
199
200 - `idle_timeout_ms: optional number`
201
202 Optional timeout after which a model response will be triggered automatically. This is
203 useful for situations in which a long pause from the user is unexpected, such as a phone
204 call. The model will effectively prompt the user to continue the conversation based
205 on the current context.
206
207 The timeout value will be applied after the last model response's audio has finished playing,
208 i.e. it's set to the `response.done` time plus audio playback duration.
209
210 An `input_audio_buffer.timeout_triggered` event (plus events
211 associated with the Response) will be emitted when the timeout is reached.
212 Idle timeout is currently only supported for `server_vad` mode.
213
214 - `interrupt_response: optional boolean`
215
216 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default
217 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.
218
219 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.
220
221 - `prefix_padding_ms: optional number`
222
223 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in
224 milliseconds). Defaults to 300ms.
225
226 - `silence_duration_ms: optional number`
227
228 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults
229 to 500ms. With shorter values the model will respond more quickly,
230 but may jump in on short pauses from the user.
231
232 - `threshold: optional number`
233
234 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A
235 higher threshold will require louder audio to activate the model, and
236 thus might perform better in noisy environments.
237
238 - `semantic_vad: object { type, create_response, eagerness, interrupt_response }`
239
240 Server-side semantic turn detection which uses a model to determine when the user has finished speaking.
241
242 - `type: "semantic_vad"`
243
244 Type of turn detection, `semantic_vad` to turn on Semantic VAD.
245
246 - `create_response: optional boolean`
247
248 Whether or not to automatically generate a response when a VAD stop event occurs.
249
250 - `eagerness: optional "low" or "medium" or "high" or "auto"`
251
252 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.
253
254 - `"low"`
255
256 - `"medium"`
257
258 - `"high"`
259
260 - `"auto"`
261
262 - `interrupt_response: optional boolean`
263
264 Whether or not to automatically interrupt any ongoing response with output to the default
265 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.
266
267 - `output: optional object { format, speed, voice }`
268
269 - `format: optional object { rate, type } or object { type } or object { type }`
270
271 The format of the output audio.
272
273 - `audio/pcm: object { rate, type }`
274
275 The PCM audio format. Only a 24kHz sample rate is supported.
276
277 - `audio/pcmu: object { type }`
278
279 The G.711 μ-law format.
280
281 - `audio/pcma: object { type }`
282
283 The G.711 A-law format.
284
285 - `speed: optional number`
286
287 The speed of the model's spoken response as a multiple of the original speed.
288 1.0 is the default speed. 0.25 is the minimum speed. 1.5 is the maximum speed. This value can only be changed in between model turns, not while a response is in progress.
289
290 This parameter is a post-processing adjustment to the audio after it is generated, it's
291 also possible to prompt the model to speak faster or slower.
292
293 - `voice: optional string or "alloy" or "ash" or "ballad" or 7 more`
294
295 The voice the model uses to respond. Voice cannot be changed during the
296 session once the model has responded with audio at least once. Current
297 voice options are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`,
298 `shimmer`, `verse`, `marin`, and `cedar`. We recommend `marin` and `cedar` for
299 best quality.
300
301 - `"alloy"`
302
303 - `"ash"`
304
305 - `"ballad"`
306
307 - `"coral"`
308
309 - `"echo"`
310
311 - `"sage"`
312
313 - `"shimmer"`
314
315 - `"verse"`
316
317 - `"marin"`
318
319 - `"cedar"`
320
321 - `expires_at: optional number`
322
323 Expiration timestamp for the session, in seconds since epoch.
324
325 - `include: optional array of "item.input_audio_transcription.logprobs"`
326
327 Additional fields to include in server outputs.
328
329 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.
330
331 - `"item.input_audio_transcription.logprobs"`
332
333 - `instructions: optional string`
334
335 The default system instructions (i.e. system message) prepended to model calls. This field allows the client to guide the model on desired responses. The model can be instructed on response content and format, (e.g. "be extremely succinct", "act friendly", "here are examples of good responses") and on audio behavior (e.g. "talk quickly", "inject emotion into your voice", "laugh frequently"). The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behavior.
336
337 Note that the server sets default instructions which will be used if this field is not set and are visible in the `session.created` event at the start of the session.
338
339 - `max_output_tokens: optional number or "inf"`
340
341 Maximum number of output tokens for a single assistant response,
342 inclusive of tool calls. Provide an integer between 1 and 4096 to
343 limit output tokens, or `inf` for the maximum available tokens for a
344 given model. Defaults to `inf`.
345
346 - `union_member_0: number`
347
348 - `union_member_1: "inf"`
349
350 - `model: optional string or "gpt-realtime" or "gpt-realtime-1.5" or "gpt-realtime-2" or 14 more`
351
352 The Realtime model used for this session.
353
354 - `"gpt-realtime"`
355
356 - `"gpt-realtime-1.5"`
357
358 - `"gpt-realtime-2"`
359
360 - `"gpt-realtime-2025-08-28"`
361
362 - `"gpt-4o-realtime-preview"`
363
364 - `"gpt-4o-realtime-preview-2024-10-01"`
365
366 - `"gpt-4o-realtime-preview-2024-12-17"`
367
368 - `"gpt-4o-realtime-preview-2025-06-03"`
369
370 - `"gpt-4o-mini-realtime-preview"`
371
372 - `"gpt-4o-mini-realtime-preview-2024-12-17"`
373
374 - `"gpt-realtime-mini"`
375
376 - `"gpt-realtime-mini-2025-10-06"`
377
378 - `"gpt-realtime-mini-2025-12-15"`
379
380 - `"gpt-audio-1.5"`
381
382 - `"gpt-audio-mini"`
383
384 - `"gpt-audio-mini-2025-10-06"`
385
386 - `"gpt-audio-mini-2025-12-15"`
387
388 - `output_modalities: optional array of "text" or "audio"`
389
390 The set of modalities the model can respond with. It defaults to `["audio"]`, indicating
391 that the model will respond with audio plus a transcript. `["text"]` can be used to make
392 the model respond with text only. It is not possible to request both `text` and `audio` at the same time.
393
394 - `"text"`
395
396 - `"audio"`
397
398 - `prompt: optional object { id, variables, version }`
399
400 Reference to a prompt template and its variables.
401 [Learn more](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts).
402
403 - `id: string`
404
405 The unique identifier of the prompt template to use.
406
407 - `variables: optional map[string or ResponseInputText or ResponseInputImage or ResponseInputFile]`
408
409 Optional map of values to substitute in for variables in your
410 prompt. The substitution values can either be strings, or other
411 Response input types like images or files.
412
413 - `union_member_0: string`
414
415 - `response_input_text: object { text, type }`
416
417 A text input to the model.
418
419 - `text: string`
420
421 The text input to the model.
422
423 - `type: "input_text"`
424
425 The type of the input item. Always `input_text`.
426
427 - `response_input_image: object { detail, type, file_id, image_url }`
428
429 An image input to the model. Learn about [image inputs](https://platform.openai.com/docs/guides/vision).
430
431 - `detail: "low" or "high" or "auto" or "original"`
432
433 The detail level of the image to be sent to the model. One of `high`, `low`, `auto`, or `original`. Defaults to `auto`.
434
435 - `"low"`
436
437 - `"high"`
438
439 - `"auto"`
440
441 - `"original"`
442
443 - `type: "input_image"`
444
445 The type of the input item. Always `input_image`.
446
447 - `file_id: optional string`
448
449 The ID of the file to be sent to the model.
450
451 - `image_url: optional string`
452
453 The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.
454
455 - `response_input_file: object { type, detail, file_data, 3 more }`
456
457 A file input to the model.
458
459 - `type: "input_file"`
460
461 The type of the input item. Always `input_file`.
462
463 - `detail: optional "low" or "high"`
464
465 The detail level of the file to be sent to the model. Use `low` for the default rendering behavior, or `high` to render the file at higher quality. Defaults to `low`.
466
467 - `"low"`
468
469 - `"high"`
470
471 - `file_data: optional string`
472
473 The content of the file to be sent to the model.
474
475 - `file_id: optional string`
476
477 The ID of the file to be sent to the model.
478
479 - `file_url: optional string`
480
481 The URL of the file to be sent to the model.
482
483 - `filename: optional string`
484
485 The name of the file to be sent to the model.
486
487 - `version: optional string`
488
489 Optional version of the prompt template.
490
491 - `reasoning: optional object { effort }`
492
493 Configuration for reasoning-capable Realtime models such as `gpt-realtime-2`.
494
495 - `effort: optional "minimal" or "low" or "medium" or 2 more`
496
497 Constrains effort on reasoning for reasoning-capable Realtime models such as
498 `gpt-realtime-2`.
499
500 - `"minimal"`
501
502 - `"low"`
503
504 - `"medium"`
505
506 - `"high"`
507
508 - `"xhigh"`
509
510 - `tool_choice: optional ToolChoiceOptions or ToolChoiceFunction or ToolChoiceMcp`
511
512 How the model chooses tools. Provide one of the string modes or force a specific
513 function/MCP tool.
514
515 - `tool_choice_options: "none" or "auto" or "required"`
516
517 Controls which (if any) tool is called by the model.
518
519 `none` means the model will not call any tool and instead generates a message.
520
521 `auto` means the model can pick between generating a message or calling one or
522 more tools.
523
524 `required` means the model must call one or more tools.
525
526 - `"none"`
527
528 - `"auto"`
529
530 - `"required"`
531
532 - `tool_choice_function: object { name, type }`
533
534 Use this option to force the model to call a specific function.
535
536 - `name: string`
537
538 The name of the function to call.
539
540 - `type: "function"`
541
542 For function calling, the type is always `function`.
543
544 - `tool_choice_mcp: object { server_label, type, name }`
545
546 Use this option to force the model to call a specific tool on a remote MCP server.
547
548 - `server_label: string`
549
550 The label of the MCP server to use.
551
552 - `type: "mcp"`
553
554 For MCP tools, the type is always `mcp`.
555
556 - `name: optional string`
557
558 The name of the tool to call on the server.
559
560 - `tools: optional array of RealtimeFunctionTool or object { server_label, type, allowed_tools, 7 more }`
561
562 Tools available to the model.
563
564 - `realtime_function_tool: object { description, name, parameters, type }`
565
566 - `description: optional string`
567
568 The description of the function, including guidance on when and how
569 to call it, and guidance about what to tell the user when calling
570 (if anything).
571
572 - `name: optional string`
573
574 The name of the function.
575
576 - `parameters: optional unknown`
577
578 Parameters of the function in JSON Schema.
579
580 - `type: optional "function"`
581
582 The type of the tool, i.e. `function`.
583
584 - `"function"`
585
586 - `MCP tool: object { server_label, type, allowed_tools, 7 more }`
587
588 Give the model access to additional tools via remote Model Context Protocol
589 (MCP) servers. [Learn more about MCP](https://platform.openai.com/docs/guides/tools-remote-mcp).
590
591 - `server_label: string`
592
593 A label for this MCP server, used to identify it in tool calls.
594
595 - `type: "mcp"`
596
597 The type of the MCP tool. Always `mcp`.
598
599 - `allowed_tools: optional array of string or object { read_only, tool_names }`
600
601 List of allowed tool names or a filter object.
602
603 - `MCP allowed tools: array of string`
604
605 A string array of allowed tool names
606
607 - `MCP tool filter: object { read_only, tool_names }`
608
609 A filter object to specify which tools are allowed.
610
611 - `read_only: optional boolean`
612
613 Indicates whether or not a tool modifies data or is read-only. If an
614 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),
615 it will match this filter.
616
617 - `tool_names: optional array of string`
618
619 List of allowed tool names.
620
621 - `authorization: optional string`
622
623 An OAuth access token that can be used with a remote MCP server, either
624 with a custom MCP server URL or a service connector. Your application
625 must handle the OAuth authorization flow and provide the token here.
626
627 - `connector_id: optional "connector_dropbox" or "connector_gmail" or "connector_googlecalendar" or 5 more`
628
629 Identifier for service connectors, like those available in ChatGPT. One of
630 `server_url` or `connector_id` must be provided. Learn more about service
631 connectors [here](https://platform.openai.com/docs/guides/tools-remote-mcp#connectors).
632
633 Currently supported `connector_id` values are:
634
635 - Dropbox: `connector_dropbox`
636 - Gmail: `connector_gmail`
637 - Google Calendar: `connector_googlecalendar`
638 - Google Drive: `connector_googledrive`
639 - Microsoft Teams: `connector_microsoftteams`
640 - Outlook Calendar: `connector_outlookcalendar`
641 - Outlook Email: `connector_outlookemail`
642 - SharePoint: `connector_sharepoint`
643
644 - `"connector_dropbox"`
645
646 - `"connector_gmail"`
647
648 - `"connector_googlecalendar"`
649
650 - `"connector_googledrive"`
651
652 - `"connector_microsoftteams"`
653
654 - `"connector_outlookcalendar"`
655
656 - `"connector_outlookemail"`
657
658 - `"connector_sharepoint"`
659
660 - `defer_loading: optional boolean`
661
662 Whether this MCP tool is deferred and discovered via tool search.
663
664 - `headers: optional map[string]`
665
666 Optional HTTP headers to send to the MCP server. Use for authentication
667 or other purposes.
668
669 - `require_approval: optional object { always, never } or "always" or "never"`
670
671 Specify which of the MCP server's tools require approval.
672
673 - `MCP tool approval filter: object { always, never }`
674
675 Specify which of the MCP server's tools require approval. Can be
676 `always`, `never`, or a filter object associated with tools
677 that require approval.
678
679 - `always: optional object { read_only, tool_names }`
680
681 A filter object to specify which tools are allowed.
682
683 - `read_only: optional boolean`
684
685 Indicates whether or not a tool modifies data or is read-only. If an
686 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),
687 it will match this filter.
688
689 - `tool_names: optional array of string`
690
691 List of allowed tool names.
692
693 - `never: optional object { read_only, tool_names }`
694
695 A filter object to specify which tools are allowed.
696
697 - `read_only: optional boolean`
698
699 Indicates whether or not a tool modifies data or is read-only. If an
700 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),
701 it will match this filter.
702
703 - `tool_names: optional array of string`
704
705 List of allowed tool names.
706
707 - `MCP tool approval setting: "always" or "never"`
708
709 Specify a single approval policy for all tools. One of `always` or
710 `never`. When set to `always`, all tools will require approval. When
711 set to `never`, all tools will not require approval.
712
713 - `"always"`
714
715 - `"never"`
716
717 - `server_description: optional string`
718
719 Optional description of the MCP server, used to provide more context.
720
721 - `server_url: optional string`
722
723 The URL for the MCP server. One of `server_url` or `connector_id` must be
724 provided.
725
726 - `tracing: optional "auto" or object { group_id, metadata, workflow_name }`
727
728 Realtime API can write session traces to the [Traces Dashboard](https://platform.openai.com/logs?api=traces). Set to null to disable tracing. Once
729 tracing is enabled for a session, the configuration cannot be modified.
730
731 `auto` will create a trace for the session with default values for the
732 workflow name, group id, and metadata.
733
734 - `auto: "auto"`
735
736 Enables tracing and sets default values for tracing configuration options. Always `auto`.
737
738 - `Tracing Configuration: object { group_id, metadata, workflow_name }`
739
740 Granular configuration for tracing.
741
742 - `group_id: optional string`
743
744 The group id to attach to this trace to enable filtering and
745 grouping in the Traces Dashboard.
746
747 - `metadata: optional unknown`
748
749 The arbitrary metadata to attach to this trace to enable
750 filtering in the Traces Dashboard.
751
752 - `workflow_name: optional string`
753
754 The name of the workflow to attach to this trace. This is used to
755 name the trace in the Traces Dashboard.
756
757 - `truncation: optional "auto" or "disabled" or RealtimeTruncationRetentionRatio`
758
759 When the number of tokens in a conversation exceeds the model's input token limit, the conversation be truncated, meaning messages (starting from the oldest) will not be included in the model's context. A 32k context model with 4,096 max output tokens can only include 28,224 tokens in the context before truncation occurs.
760
761 Clients can configure truncation behavior to truncate with a lower max token limit, which is an effective way to control token usage and cost.
762
763 Truncation will reduce the number of cached tokens on the next turn (busting the cache), since messages are dropped from the beginning of the context. However, clients can also configure truncation to retain messages up to a fraction of the maximum context size, which will reduce the need for future truncations and thus improve the cache rate.
764
765 Truncation can be disabled entirely, which means the server will never truncate but would instead return an error if the conversation exceeds the model's input token limit.
766
767 - `RealtimeTruncationStrategy: "auto" or "disabled"`
768
769 The truncation strategy to use for the session. `auto` is the default truncation strategy. `disabled` will disable truncation and emit errors when the conversation exceeds the input token limit.
770
771 - `"auto"`
772
773 - `"disabled"`
774
775 - `realtime_truncation_retention_ratio: object { retention_ratio, type, token_limits }`
776
777 Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.
778
779 - `retention_ratio: number`
780
781 Fraction of post-instruction conversation tokens to retain (`0.0` - `1.0`) when the conversation exceeds the input token limit. Setting this to `0.8` means that messages will be dropped until 80% of the maximum allowed tokens are used. This helps reduce the frequency of truncations and improve cache rates.
782
783 - `type: "retention_ratio"`
784
785 Use retention ratio truncation.
786
787 - `token_limits: optional object { post_instructions }`
788
789 Optional custom token limits for this truncation strategy. If not provided, the model's default token limits will be used.
790
791 - `post_instructions: optional number`
792
793 Maximum tokens allowed in the conversation after instructions (which including tool definitions). For example, setting this to 5,000 would mean that truncation would occur when the conversation exceeds 5,000 tokens after instructions. This cannot be higher than the model's context window size minus the maximum output tokens.
794
795 - `realtime_transcription_session_create_response: object { id, object, type, 3 more }`
796
797 A Realtime transcription session configuration object.
798
799 - `id: string`
800
801 Unique identifier for the session that looks like `sess_1234567890abcdef`.
802
803 - `object: string`
804
805 The object type. Always `realtime.transcription_session`.
806
807 - `type: "transcription"`
808
809 The type of session. Always `transcription` for transcription sessions.
810
811 - `audio: optional object { input }`
812
813 Configuration for input audio for the session.
814
815 - `input: optional object { format, noise_reduction, transcription, turn_detection }`
816
817 - `format: optional object { rate, type } or object { type } or object { type }`
818
819 The PCM audio format. Only a 24kHz sample rate is supported.
820
821 - `audio/pcm: object { rate, type }`
822
823 The PCM audio format. Only a 24kHz sample rate is supported.
824
825 - `audio/pcmu: object { type }`
826
827 The G.711 μ-law format.
828
829 - `audio/pcma: object { type }`
830
831 The G.711 A-law format.
832
833 - `noise_reduction: optional object { type }`
834
835 Configuration for input audio noise reduction.
836
837 - `type: optional "near_field" or "far_field"`
838
839 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.
840
841 - `"near_field"`
842
843 - `"far_field"`
844
845 - `transcription: optional object { delay, language, model, prompt }`
846
847 - `delay: optional "minimal" or "low" or "medium" or 2 more`
848
849 Controls how long the model waits before emitting transcription text.
850 Higher values can improve transcription accuracy at the cost of latency.
851 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.
852
853 - `language: optional string`
854
855 The language of the input audio. Supplying the input language in
856 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format
857 will improve accuracy and latency.
858
859 - `model: optional string or "whisper-1" or "gpt-4o-mini-transcribe" or "gpt-4o-mini-transcribe-2025-12-15" or 3 more`
860
861 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.
862
863 - `prompt: optional string`
864
865 An optional text to guide the model's style or continue a previous audio
866 segment.
867 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).
868 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".
869 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.
870
871 - `turn_detection: optional object { prefix_padding_ms, silence_duration_ms, threshold, type }`
872
873 Configuration for turn detection. Can be set to `null` to turn off. Server
874 VAD means that the model will detect the start and end of speech based on
875 audio volume and respond at the end of user speech. For `gpt-realtime-whisper`, this must be `null`; VAD is not supported.
876
877 - `prefix_padding_ms: optional number`
878
879 Amount of audio to include before the VAD detected speech (in
880 milliseconds). Defaults to 300ms.
881
882 - `silence_duration_ms: optional number`
883
884 Duration of silence to detect speech stop (in milliseconds). Defaults
885 to 500ms. With shorter values the model will respond more quickly,
886 but may jump in on short pauses from the user.
887
888 - `threshold: optional number`
889
890 Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A
891 higher threshold will require louder audio to activate the model, and
892 thus might perform better in noisy environments.
893
894 - `type: optional string`
895
896 Type of turn detection, only `server_vad` is currently supported.
897
898 - `expires_at: optional number`
899
900 Expiration timestamp for the session, in seconds since epoch.
901
902 - `include: optional array of "item.input_audio_transcription.logprobs"`
903
904 Additional fields to include in server outputs.
905
906 - `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.
907
908 - `"item.input_audio_transcription.logprobs"`
909
910 - `value: string`
911
912 The generated client secret value.
913
914### Example
915
916```cli
917openai realtime:client-secrets create \
918 --api-key 'My API Key'
919```
920
921#### Response
922
923```json
924{
925 "expires_at": 0,
926 "session": {
927 "id": "id",
928 "object": "realtime.session",
929 "type": "realtime",
930 "audio": {
931 "input": {
932 "format": {
933 "rate": 24000,
934 "type": "audio/pcm"
935 },
936 "noise_reduction": {
937 "type": "near_field"
938 },
939 "transcription": {
940 "delay": "minimal",
941 "language": "language",
942 "model": "string",
943 "prompt": "prompt"
944 },
945 "turn_detection": {
946 "type": "server_vad",
947 "create_response": true,
948 "idle_timeout_ms": 5000,
949 "interrupt_response": true,
950 "prefix_padding_ms": 0,
951 "silence_duration_ms": 0,
952 "threshold": 0
953 }
954 },
955 "output": {
956 "format": {
957 "rate": 24000,
958 "type": "audio/pcm"
959 },
960 "speed": 0.25,
961 "voice": "ash"
962 }
963 },
964 "expires_at": 0,
965 "include": [
966 "item.input_audio_transcription.logprobs"
967 ],
968 "instructions": "instructions",
969 "max_output_tokens": 0,
970 "model": "string",
971 "output_modalities": [
972 "text"
973 ],
974 "prompt": {
975 "id": "id",
976 "variables": {
977 "foo": "string"
978 },
979 "version": "version"
980 },
981 "reasoning": {
982 "effort": "minimal"
983 },
984 "tool_choice": "none",
985 "tools": [
986 {
987 "description": "description",
988 "name": "name",
989 "parameters": {},
990 "type": "function"
991 }
992 ],
993 "tracing": "auto",
994 "truncation": "auto"
995 },
996 "value": "value"
997}
998```
999
1000## Domain Types
1001
1002### Realtime Session Create Response
1003
1004- `realtime_session_create_response: object { id, object, type, 13 more }`
1005
1006 A Realtime session configuration object.
1007
1008 - `id: string`
1009
1010 Unique identifier for the session that looks like `sess_1234567890abcdef`.
1011
1012 - `object: "realtime.session"`
1013
1014 The object type. Always `realtime.session`.
1015
1016 - `type: "realtime"`
1017
1018 The type of session to create. Always `realtime` for the Realtime API.
1019
1020 - `audio: optional object { input, output }`
1021
1022 Configuration for input and output audio.
1023
1024 - `input: optional object { format, noise_reduction, transcription, turn_detection }`
1025
1026 - `format: optional object { rate, type } or object { type } or object { type }`
1027
1028 The format of the input audio.
1029
1030 - `audio/pcm: object { rate, type }`
1031
1032 The PCM audio format. Only a 24kHz sample rate is supported.
1033
1034 - `rate: optional 24000`
1035
1036 The sample rate of the audio. Always `24000`.
1037
1038 - `24000`
1039
1040 - `type: optional "audio/pcm"`
1041
1042 The audio format. Always `audio/pcm`.
1043
1044 - `"audio/pcm"`
1045
1046 - `audio/pcmu: object { type }`
1047
1048 The G.711 μ-law format.
1049
1050 - `type: optional "audio/pcmu"`
1051
1052 The audio format. Always `audio/pcmu`.
1053
1054 - `"audio/pcmu"`
1055
1056 - `audio/pcma: object { type }`
1057
1058 The G.711 A-law format.
1059
1060 - `type: optional "audio/pcma"`
1061
1062 The audio format. Always `audio/pcma`.
1063
1064 - `"audio/pcma"`
1065
1066 - `noise_reduction: optional object { type }`
1067
1068 Configuration for input audio noise reduction. This can be set to `null` to turn off.
1069 Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.
1070 Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.
1071
1072 - `type: optional "near_field" or "far_field"`
1073
1074 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.
1075
1076 - `"near_field"`
1077
1078 - `"far_field"`
1079
1080 - `transcription: optional object { delay, language, model, prompt }`
1081
1082 - `delay: optional "minimal" or "low" or "medium" or 2 more`
1083
1084 Controls how long the model waits before emitting transcription text.
1085 Higher values can improve transcription accuracy at the cost of latency.
1086 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.
1087
1088 - `"minimal"`
1089
1090 - `"low"`
1091
1092 - `"medium"`
1093
1094 - `"high"`
1095
1096 - `"xhigh"`
1097
1098 - `language: optional string`
1099
1100 The language of the input audio. Supplying the input language in
1101 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format
1102 will improve accuracy and latency.
1103
1104 - `model: optional string or "whisper-1" or "gpt-4o-mini-transcribe" or "gpt-4o-mini-transcribe-2025-12-15" or 3 more`
1105
1106 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.
1107
1108 - `"whisper-1"`
1109
1110 - `"gpt-4o-mini-transcribe"`
1111
1112 - `"gpt-4o-mini-transcribe-2025-12-15"`
1113
1114 - `"gpt-4o-transcribe"`
1115
1116 - `"gpt-4o-transcribe-diarize"`
1117
1118 - `"gpt-realtime-whisper"`
1119
1120 - `prompt: optional string`
1121
1122 An optional text to guide the model's style or continue a previous audio
1123 segment.
1124 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).
1125 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".
1126 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.
1127
1128 - `turn_detection: optional object { type, create_response, idle_timeout_ms, 4 more } or object { type, create_response, eagerness, interrupt_response }`
1129
1130 Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.
1131
1132 Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.
1133
1134 Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.
1135
1136 For `gpt-realtime-whisper` transcription sessions, turn detection must be
1137 set to `null`; VAD is not supported.
1138
1139 - `server_vad: object { type, create_response, idle_timeout_ms, 4 more }`
1140
1141 Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.
1142
1143 - `type: "server_vad"`
1144
1145 Type of turn detection, `server_vad` to turn on simple Server VAD.
1146
1147 - `create_response: optional boolean`
1148
1149 Whether or not to automatically generate a response when a VAD stop event occurs. If `interrupt_response` is set to `false` this may fail to create a response if the model is already responding.
1150
1151 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.
1152
1153 - `idle_timeout_ms: optional number`
1154
1155 Optional timeout after which a model response will be triggered automatically. This is
1156 useful for situations in which a long pause from the user is unexpected, such as a phone
1157 call. The model will effectively prompt the user to continue the conversation based
1158 on the current context.
1159
1160 The timeout value will be applied after the last model response's audio has finished playing,
1161 i.e. it's set to the `response.done` time plus audio playback duration.
1162
1163 An `input_audio_buffer.timeout_triggered` event (plus events
1164 associated with the Response) will be emitted when the timeout is reached.
1165 Idle timeout is currently only supported for `server_vad` mode.
1166
1167 - `interrupt_response: optional boolean`
1168
1169 Whether or not to automatically interrupt (cancel) any ongoing response with output to the default
1170 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs. If `true` then the response will be cancelled, otherwise it will continue until complete.
1171
1172 If both `create_response` and `interrupt_response` are set to `false`, the model will never respond automatically but VAD events will still be emitted.
1173
1174 - `prefix_padding_ms: optional number`
1175
1176 Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in
1177 milliseconds). Defaults to 300ms.
1178
1179 - `silence_duration_ms: optional number`
1180
1181 Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults
1182 to 500ms. With shorter values the model will respond more quickly,
1183 but may jump in on short pauses from the user.
1184
1185 - `threshold: optional number`
1186
1187 Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A
1188 higher threshold will require louder audio to activate the model, and
1189 thus might perform better in noisy environments.
1190
1191 - `semantic_vad: object { type, create_response, eagerness, interrupt_response }`
1192
1193 Server-side semantic turn detection which uses a model to determine when the user has finished speaking.
1194
1195 - `type: "semantic_vad"`
1196
1197 Type of turn detection, `semantic_vad` to turn on Semantic VAD.
1198
1199 - `create_response: optional boolean`
1200
1201 Whether or not to automatically generate a response when a VAD stop event occurs.
1202
1203 - `eagerness: optional "low" or "medium" or "high" or "auto"`
1204
1205 Used only for `semantic_vad` mode. The eagerness of the model to respond. `low` will wait longer for the user to continue speaking, `high` will respond more quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`, and `high` have max timeouts of 8s, 4s, and 2s respectively.
1206
1207 - `"low"`
1208
1209 - `"medium"`
1210
1211 - `"high"`
1212
1213 - `"auto"`
1214
1215 - `interrupt_response: optional boolean`
1216
1217 Whether or not to automatically interrupt any ongoing response with output to the default
1218 conversation (i.e. `conversation` of `auto`) when a VAD start event occurs.
1219
1220 - `output: optional object { format, speed, voice }`
1221
1222 - `format: optional object { rate, type } or object { type } or object { type }`
1223
1224 The format of the output audio.
1225
1226 - `audio/pcm: object { rate, type }`
1227
1228 The PCM audio format. Only a 24kHz sample rate is supported.
1229
1230 - `audio/pcmu: object { type }`
1231
1232 The G.711 μ-law format.
1233
1234 - `audio/pcma: object { type }`
1235
1236 The G.711 A-law format.
1237
1238 - `speed: optional number`
1239
1240 The speed of the model's spoken response as a multiple of the original speed.
1241 1.0 is the default speed. 0.25 is the minimum speed. 1.5 is the maximum speed. This value can only be changed in between model turns, not while a response is in progress.
1242
1243 This parameter is a post-processing adjustment to the audio after it is generated, it's
1244 also possible to prompt the model to speak faster or slower.
1245
1246 - `voice: optional string or "alloy" or "ash" or "ballad" or 7 more`
1247
1248 The voice the model uses to respond. Voice cannot be changed during the
1249 session once the model has responded with audio at least once. Current
1250 voice options are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`,
1251 `shimmer`, `verse`, `marin`, and `cedar`. We recommend `marin` and `cedar` for
1252 best quality.
1253
1254 - `"alloy"`
1255
1256 - `"ash"`
1257
1258 - `"ballad"`
1259
1260 - `"coral"`
1261
1262 - `"echo"`
1263
1264 - `"sage"`
1265
1266 - `"shimmer"`
1267
1268 - `"verse"`
1269
1270 - `"marin"`
1271
1272 - `"cedar"`
1273
1274 - `expires_at: optional number`
1275
1276 Expiration timestamp for the session, in seconds since epoch.
1277
1278 - `include: optional array of "item.input_audio_transcription.logprobs"`
1279
1280 Additional fields to include in server outputs.
1281
1282 `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.
1283
1284 - `"item.input_audio_transcription.logprobs"`
1285
1286 - `instructions: optional string`
1287
1288 The default system instructions (i.e. system message) prepended to model calls. This field allows the client to guide the model on desired responses. The model can be instructed on response content and format, (e.g. "be extremely succinct", "act friendly", "here are examples of good responses") and on audio behavior (e.g. "talk quickly", "inject emotion into your voice", "laugh frequently"). The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behavior.
1289
1290 Note that the server sets default instructions which will be used if this field is not set and are visible in the `session.created` event at the start of the session.
1291
1292 - `max_output_tokens: optional number or "inf"`
1293
1294 Maximum number of output tokens for a single assistant response,
1295 inclusive of tool calls. Provide an integer between 1 and 4096 to
1296 limit output tokens, or `inf` for the maximum available tokens for a
1297 given model. Defaults to `inf`.
1298
1299 - `union_member_0: number`
1300
1301 - `union_member_1: "inf"`
1302
1303 - `model: optional string or "gpt-realtime" or "gpt-realtime-1.5" or "gpt-realtime-2" or 14 more`
1304
1305 The Realtime model used for this session.
1306
1307 - `"gpt-realtime"`
1308
1309 - `"gpt-realtime-1.5"`
1310
1311 - `"gpt-realtime-2"`
1312
1313 - `"gpt-realtime-2025-08-28"`
1314
1315 - `"gpt-4o-realtime-preview"`
1316
1317 - `"gpt-4o-realtime-preview-2024-10-01"`
1318
1319 - `"gpt-4o-realtime-preview-2024-12-17"`
1320
1321 - `"gpt-4o-realtime-preview-2025-06-03"`
1322
1323 - `"gpt-4o-mini-realtime-preview"`
1324
1325 - `"gpt-4o-mini-realtime-preview-2024-12-17"`
1326
1327 - `"gpt-realtime-mini"`
1328
1329 - `"gpt-realtime-mini-2025-10-06"`
1330
1331 - `"gpt-realtime-mini-2025-12-15"`
1332
1333 - `"gpt-audio-1.5"`
1334
1335 - `"gpt-audio-mini"`
1336
1337 - `"gpt-audio-mini-2025-10-06"`
1338
1339 - `"gpt-audio-mini-2025-12-15"`
1340
1341 - `output_modalities: optional array of "text" or "audio"`
1342
1343 The set of modalities the model can respond with. It defaults to `["audio"]`, indicating
1344 that the model will respond with audio plus a transcript. `["text"]` can be used to make
1345 the model respond with text only. It is not possible to request both `text` and `audio` at the same time.
1346
1347 - `"text"`
1348
1349 - `"audio"`
1350
1351 - `prompt: optional object { id, variables, version }`
1352
1353 Reference to a prompt template and its variables.
1354 [Learn more](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts).
1355
1356 - `id: string`
1357
1358 The unique identifier of the prompt template to use.
1359
1360 - `variables: optional map[string or ResponseInputText or ResponseInputImage or ResponseInputFile]`
1361
1362 Optional map of values to substitute in for variables in your
1363 prompt. The substitution values can either be strings, or other
1364 Response input types like images or files.
1365
1366 - `union_member_0: string`
1367
1368 - `response_input_text: object { text, type }`
1369
1370 A text input to the model.
1371
1372 - `text: string`
1373
1374 The text input to the model.
1375
1376 - `type: "input_text"`
1377
1378 The type of the input item. Always `input_text`.
1379
1380 - `response_input_image: object { detail, type, file_id, image_url }`
1381
1382 An image input to the model. Learn about [image inputs](https://platform.openai.com/docs/guides/vision).
1383
1384 - `detail: "low" or "high" or "auto" or "original"`
1385
1386 The detail level of the image to be sent to the model. One of `high`, `low`, `auto`, or `original`. Defaults to `auto`.
1387
1388 - `"low"`
1389
1390 - `"high"`
1391
1392 - `"auto"`
1393
1394 - `"original"`
1395
1396 - `type: "input_image"`
1397
1398 The type of the input item. Always `input_image`.
1399
1400 - `file_id: optional string`
1401
1402 The ID of the file to be sent to the model.
1403
1404 - `image_url: optional string`
1405
1406 The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.
1407
1408 - `response_input_file: object { type, detail, file_data, 3 more }`
1409
1410 A file input to the model.
1411
1412 - `type: "input_file"`
1413
1414 The type of the input item. Always `input_file`.
1415
1416 - `detail: optional "low" or "high"`
1417
1418 The detail level of the file to be sent to the model. Use `low` for the default rendering behavior, or `high` to render the file at higher quality. Defaults to `low`.
1419
1420 - `"low"`
1421
1422 - `"high"`
1423
1424 - `file_data: optional string`
1425
1426 The content of the file to be sent to the model.
1427
1428 - `file_id: optional string`
1429
1430 The ID of the file to be sent to the model.
1431
1432 - `file_url: optional string`
1433
1434 The URL of the file to be sent to the model.
1435
1436 - `filename: optional string`
1437
1438 The name of the file to be sent to the model.
1439
1440 - `version: optional string`
1441
1442 Optional version of the prompt template.
1443
1444 - `reasoning: optional object { effort }`
1445
1446 Configuration for reasoning-capable Realtime models such as `gpt-realtime-2`.
1447
1448 - `effort: optional "minimal" or "low" or "medium" or 2 more`
1449
1450 Constrains effort on reasoning for reasoning-capable Realtime models such as
1451 `gpt-realtime-2`.
1452
1453 - `"minimal"`
1454
1455 - `"low"`
1456
1457 - `"medium"`
1458
1459 - `"high"`
1460
1461 - `"xhigh"`
1462
1463 - `tool_choice: optional ToolChoiceOptions or ToolChoiceFunction or ToolChoiceMcp`
1464
1465 How the model chooses tools. Provide one of the string modes or force a specific
1466 function/MCP tool.
1467
1468 - `tool_choice_options: "none" or "auto" or "required"`
1469
1470 Controls which (if any) tool is called by the model.
1471
1472 `none` means the model will not call any tool and instead generates a message.
1473
1474 `auto` means the model can pick between generating a message or calling one or
1475 more tools.
1476
1477 `required` means the model must call one or more tools.
1478
1479 - `"none"`
1480
1481 - `"auto"`
1482
1483 - `"required"`
1484
1485 - `tool_choice_function: object { name, type }`
1486
1487 Use this option to force the model to call a specific function.
1488
1489 - `name: string`
1490
1491 The name of the function to call.
1492
1493 - `type: "function"`
1494
1495 For function calling, the type is always `function`.
1496
1497 - `tool_choice_mcp: object { server_label, type, name }`
1498
1499 Use this option to force the model to call a specific tool on a remote MCP server.
1500
1501 - `server_label: string`
1502
1503 The label of the MCP server to use.
1504
1505 - `type: "mcp"`
1506
1507 For MCP tools, the type is always `mcp`.
1508
1509 - `name: optional string`
1510
1511 The name of the tool to call on the server.
1512
1513 - `tools: optional array of RealtimeFunctionTool or object { server_label, type, allowed_tools, 7 more }`
1514
1515 Tools available to the model.
1516
1517 - `realtime_function_tool: object { description, name, parameters, type }`
1518
1519 - `description: optional string`
1520
1521 The description of the function, including guidance on when and how
1522 to call it, and guidance about what to tell the user when calling
1523 (if anything).
1524
1525 - `name: optional string`
1526
1527 The name of the function.
1528
1529 - `parameters: optional unknown`
1530
1531 Parameters of the function in JSON Schema.
1532
1533 - `type: optional "function"`
1534
1535 The type of the tool, i.e. `function`.
1536
1537 - `"function"`
1538
1539 - `MCP tool: object { server_label, type, allowed_tools, 7 more }`
1540
1541 Give the model access to additional tools via remote Model Context Protocol
1542 (MCP) servers. [Learn more about MCP](https://platform.openai.com/docs/guides/tools-remote-mcp).
1543
1544 - `server_label: string`
1545
1546 A label for this MCP server, used to identify it in tool calls.
1547
1548 - `type: "mcp"`
1549
1550 The type of the MCP tool. Always `mcp`.
1551
1552 - `allowed_tools: optional array of string or object { read_only, tool_names }`
1553
1554 List of allowed tool names or a filter object.
1555
1556 - `MCP allowed tools: array of string`
1557
1558 A string array of allowed tool names
1559
1560 - `MCP tool filter: object { read_only, tool_names }`
1561
1562 A filter object to specify which tools are allowed.
1563
1564 - `read_only: optional boolean`
1565
1566 Indicates whether or not a tool modifies data or is read-only. If an
1567 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),
1568 it will match this filter.
1569
1570 - `tool_names: optional array of string`
1571
1572 List of allowed tool names.
1573
1574 - `authorization: optional string`
1575
1576 An OAuth access token that can be used with a remote MCP server, either
1577 with a custom MCP server URL or a service connector. Your application
1578 must handle the OAuth authorization flow and provide the token here.
1579
1580 - `connector_id: optional "connector_dropbox" or "connector_gmail" or "connector_googlecalendar" or 5 more`
1581
1582 Identifier for service connectors, like those available in ChatGPT. One of
1583 `server_url` or `connector_id` must be provided. Learn more about service
1584 connectors [here](https://platform.openai.com/docs/guides/tools-remote-mcp#connectors).
1585
1586 Currently supported `connector_id` values are:
1587
1588 - Dropbox: `connector_dropbox`
1589 - Gmail: `connector_gmail`
1590 - Google Calendar: `connector_googlecalendar`
1591 - Google Drive: `connector_googledrive`
1592 - Microsoft Teams: `connector_microsoftteams`
1593 - Outlook Calendar: `connector_outlookcalendar`
1594 - Outlook Email: `connector_outlookemail`
1595 - SharePoint: `connector_sharepoint`
1596
1597 - `"connector_dropbox"`
1598
1599 - `"connector_gmail"`
1600
1601 - `"connector_googlecalendar"`
1602
1603 - `"connector_googledrive"`
1604
1605 - `"connector_microsoftteams"`
1606
1607 - `"connector_outlookcalendar"`
1608
1609 - `"connector_outlookemail"`
1610
1611 - `"connector_sharepoint"`
1612
1613 - `defer_loading: optional boolean`
1614
1615 Whether this MCP tool is deferred and discovered via tool search.
1616
1617 - `headers: optional map[string]`
1618
1619 Optional HTTP headers to send to the MCP server. Use for authentication
1620 or other purposes.
1621
1622 - `require_approval: optional object { always, never } or "always" or "never"`
1623
1624 Specify which of the MCP server's tools require approval.
1625
1626 - `MCP tool approval filter: object { always, never }`
1627
1628 Specify which of the MCP server's tools require approval. Can be
1629 `always`, `never`, or a filter object associated with tools
1630 that require approval.
1631
1632 - `always: optional object { read_only, tool_names }`
1633
1634 A filter object to specify which tools are allowed.
1635
1636 - `read_only: optional boolean`
1637
1638 Indicates whether or not a tool modifies data or is read-only. If an
1639 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),
1640 it will match this filter.
1641
1642 - `tool_names: optional array of string`
1643
1644 List of allowed tool names.
1645
1646 - `never: optional object { read_only, tool_names }`
1647
1648 A filter object to specify which tools are allowed.
1649
1650 - `read_only: optional boolean`
1651
1652 Indicates whether or not a tool modifies data or is read-only. If an
1653 MCP server is [annotated with `readOnlyHint`](https://modelcontextprotocol.io/specification/2025-06-18/schema#toolannotations-readonlyhint),
1654 it will match this filter.
1655
1656 - `tool_names: optional array of string`
1657
1658 List of allowed tool names.
1659
1660 - `MCP tool approval setting: "always" or "never"`
1661
1662 Specify a single approval policy for all tools. One of `always` or
1663 `never`. When set to `always`, all tools will require approval. When
1664 set to `never`, all tools will not require approval.
1665
1666 - `"always"`
1667
1668 - `"never"`
1669
1670 - `server_description: optional string`
1671
1672 Optional description of the MCP server, used to provide more context.
1673
1674 - `server_url: optional string`
1675
1676 The URL for the MCP server. One of `server_url` or `connector_id` must be
1677 provided.
1678
1679 - `tracing: optional "auto" or object { group_id, metadata, workflow_name }`
1680
1681 Realtime API can write session traces to the [Traces Dashboard](https://platform.openai.com/logs?api=traces). Set to null to disable tracing. Once
1682 tracing is enabled for a session, the configuration cannot be modified.
1683
1684 `auto` will create a trace for the session with default values for the
1685 workflow name, group id, and metadata.
1686
1687 - `auto: "auto"`
1688
1689 Enables tracing and sets default values for tracing configuration options. Always `auto`.
1690
1691 - `Tracing Configuration: object { group_id, metadata, workflow_name }`
1692
1693 Granular configuration for tracing.
1694
1695 - `group_id: optional string`
1696
1697 The group id to attach to this trace to enable filtering and
1698 grouping in the Traces Dashboard.
1699
1700 - `metadata: optional unknown`
1701
1702 The arbitrary metadata to attach to this trace to enable
1703 filtering in the Traces Dashboard.
1704
1705 - `workflow_name: optional string`
1706
1707 The name of the workflow to attach to this trace. This is used to
1708 name the trace in the Traces Dashboard.
1709
1710 - `truncation: optional "auto" or "disabled" or RealtimeTruncationRetentionRatio`
1711
1712 When the number of tokens in a conversation exceeds the model's input token limit, the conversation be truncated, meaning messages (starting from the oldest) will not be included in the model's context. A 32k context model with 4,096 max output tokens can only include 28,224 tokens in the context before truncation occurs.
1713
1714 Clients can configure truncation behavior to truncate with a lower max token limit, which is an effective way to control token usage and cost.
1715
1716 Truncation will reduce the number of cached tokens on the next turn (busting the cache), since messages are dropped from the beginning of the context. However, clients can also configure truncation to retain messages up to a fraction of the maximum context size, which will reduce the need for future truncations and thus improve the cache rate.
1717
1718 Truncation can be disabled entirely, which means the server will never truncate but would instead return an error if the conversation exceeds the model's input token limit.
1719
1720 - `RealtimeTruncationStrategy: "auto" or "disabled"`
1721
1722 The truncation strategy to use for the session. `auto` is the default truncation strategy. `disabled` will disable truncation and emit errors when the conversation exceeds the input token limit.
1723
1724 - `"auto"`
1725
1726 - `"disabled"`
1727
1728 - `realtime_truncation_retention_ratio: object { retention_ratio, type, token_limits }`
1729
1730 Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.
1731
1732 - `retention_ratio: number`
1733
1734 Fraction of post-instruction conversation tokens to retain (`0.0` - `1.0`) when the conversation exceeds the input token limit. Setting this to `0.8` means that messages will be dropped until 80% of the maximum allowed tokens are used. This helps reduce the frequency of truncations and improve cache rates.
1735
1736 - `type: "retention_ratio"`
1737
1738 Use retention ratio truncation.
1739
1740 - `token_limits: optional object { post_instructions }`
1741
1742 Optional custom token limits for this truncation strategy. If not provided, the model's default token limits will be used.
1743
1744 - `post_instructions: optional number`
1745
1746 Maximum tokens allowed in the conversation after instructions (which including tool definitions). For example, setting this to 5,000 would mean that truncation would occur when the conversation exceeds 5,000 tokens after instructions. This cannot be higher than the model's context window size minus the maximum output tokens.
1747
1748### Realtime Transcription Session Create Response
1749
1750- `realtime_transcription_session_create_response: object { id, object, type, 3 more }`
1751
1752 A Realtime transcription session configuration object.
1753
1754 - `id: string`
1755
1756 Unique identifier for the session that looks like `sess_1234567890abcdef`.
1757
1758 - `object: string`
1759
1760 The object type. Always `realtime.transcription_session`.
1761
1762 - `type: "transcription"`
1763
1764 The type of session. Always `transcription` for transcription sessions.
1765
1766 - `audio: optional object { input }`
1767
1768 Configuration for input audio for the session.
1769
1770 - `input: optional object { format, noise_reduction, transcription, turn_detection }`
1771
1772 - `format: optional object { rate, type } or object { type } or object { type }`
1773
1774 The PCM audio format. Only a 24kHz sample rate is supported.
1775
1776 - `audio/pcm: object { rate, type }`
1777
1778 The PCM audio format. Only a 24kHz sample rate is supported.
1779
1780 - `rate: optional 24000`
1781
1782 The sample rate of the audio. Always `24000`.
1783
1784 - `24000`
1785
1786 - `type: optional "audio/pcm"`
1787
1788 The audio format. Always `audio/pcm`.
1789
1790 - `"audio/pcm"`
1791
1792 - `audio/pcmu: object { type }`
1793
1794 The G.711 μ-law format.
1795
1796 - `type: optional "audio/pcmu"`
1797
1798 The audio format. Always `audio/pcmu`.
1799
1800 - `"audio/pcmu"`
1801
1802 - `audio/pcma: object { type }`
1803
1804 The G.711 A-law format.
1805
1806 - `type: optional "audio/pcma"`
1807
1808 The audio format. Always `audio/pcma`.
1809
1810 - `"audio/pcma"`
1811
1812 - `noise_reduction: optional object { type }`
1813
1814 Configuration for input audio noise reduction.
1815
1816 - `type: optional "near_field" or "far_field"`
1817
1818 Type of noise reduction. `near_field` is for close-talking microphones such as headphones, `far_field` is for far-field microphones such as laptop or conference room microphones.
1819
1820 - `"near_field"`
1821
1822 - `"far_field"`
1823
1824 - `transcription: optional object { delay, language, model, prompt }`
1825
1826 - `delay: optional "minimal" or "low" or "medium" or 2 more`
1827
1828 Controls how long the model waits before emitting transcription text.
1829 Higher values can improve transcription accuracy at the cost of latency.
1830 Only supported with `gpt-realtime-whisper` in GA Realtime sessions.
1831
1832 - `"minimal"`
1833
1834 - `"low"`
1835
1836 - `"medium"`
1837
1838 - `"high"`
1839
1840 - `"xhigh"`
1841
1842 - `language: optional string`
1843
1844 The language of the input audio. Supplying the input language in
1845 [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format
1846 will improve accuracy and latency.
1847
1848 - `model: optional string or "whisper-1" or "gpt-4o-mini-transcribe" or "gpt-4o-mini-transcribe-2025-12-15" or 3 more`
1849
1850 The model to use for transcription. Current options are `whisper-1`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, and `gpt-realtime-whisper`. Use `gpt-4o-transcribe-diarize` when you need diarization with speaker labels.
1851
1852 - `"whisper-1"`
1853
1854 - `"gpt-4o-mini-transcribe"`
1855
1856 - `"gpt-4o-mini-transcribe-2025-12-15"`
1857
1858 - `"gpt-4o-transcribe"`
1859
1860 - `"gpt-4o-transcribe-diarize"`
1861
1862 - `"gpt-realtime-whisper"`
1863
1864 - `prompt: optional string`
1865
1866 An optional text to guide the model's style or continue a previous audio
1867 segment.
1868 For `whisper-1`, the [prompt is a list of keywords](https://platform.openai.com/docs/guides/speech-to-text#prompting).
1869 For `gpt-4o-transcribe` models (excluding `gpt-4o-transcribe-diarize`), the prompt is a free text string, for example "expect words related to technology".
1870 Prompt is not supported with `gpt-realtime-whisper` in GA Realtime sessions.
1871
1872 - `turn_detection: optional object { prefix_padding_ms, silence_duration_ms, threshold, type }`
1873
1874 Configuration for turn detection. Can be set to `null` to turn off. Server
1875 VAD means that the model will detect the start and end of speech based on
1876 audio volume and respond at the end of user speech. For `gpt-realtime-whisper`, this must be `null`; VAD is not supported.
1877
1878 - `prefix_padding_ms: optional number`
1879
1880 Amount of audio to include before the VAD detected speech (in
1881 milliseconds). Defaults to 300ms.
1882
1883 - `silence_duration_ms: optional number`
1884
1885 Duration of silence to detect speech stop (in milliseconds). Defaults
1886 to 500ms. With shorter values the model will respond more quickly,
1887 but may jump in on short pauses from the user.
1888
1889 - `threshold: optional number`
1890
1891 Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A
1892 higher threshold will require louder audio to activate the model, and
1893 thus might perform better in noisy environments.
1894
1895 - `type: optional string`
1896
1897 Type of turn detection, only `server_vad` is currently supported.
1898
1899 - `expires_at: optional number`
1900
1901 Expiration timestamp for the session, in seconds since epoch.
1902
1903 - `include: optional array of "item.input_audio_transcription.logprobs"`
1904
1905 Additional fields to include in server outputs.
1906
1907 - `item.input_audio_transcription.logprobs`: Include logprobs for input audio transcription.
1908
1909 - `"item.input_audio_transcription.logprobs"`
1910
1911### Realtime Transcription Session Turn Detection
1912
1913- `realtime_transcription_session_turn_detection: object { prefix_padding_ms, silence_duration_ms, threshold, type }`
1914
1915 Configuration for turn detection. Can be set to `null` to turn off. Server
1916 VAD means that the model will detect the start and end of speech based on
1917 audio volume and respond at the end of user speech. For `gpt-realtime-whisper`, this must be `null`; VAD is not supported.
1918
1919 - `prefix_padding_ms: optional number`
1920
1921 Amount of audio to include before the VAD detected speech (in
1922 milliseconds). Defaults to 300ms.
1923
1924 - `silence_duration_ms: optional number`
1925
1926 Duration of silence to detect speech stop (in milliseconds). Defaults
1927 to 500ms. With shorter values the model will respond more quickly,
1928 but may jump in on short pauses from the user.
1929
1930 - `threshold: optional number`
1931
1932 Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A
1933 higher threshold will require louder audio to activate the model, and
1934 thus might perform better in noisy environments.
1935
1936 - `type: optional string`
1937
1938 Type of turn detection, only `server_vad` is currently supported.