Go Premium Account

Spybara
Companies
Openai
Api
Reference Changes, 2026-05-18 22:01 UTC to 2026-05-19 06:34 UTC
java/resources/audio/subresources/transcriptions/index.md

java/resources/audio/subresources/transcriptions/index.md 2026-05-18 22:01 UTC to 2026-05-19 06:34 UTC

0 added, 1076 removed.

2026

Wed 27 06:42 Fri 22 06:33 Wed 20 06:35 Tue 19 06:34 Mon 18 22:01 Mon 11 18:00 Thu 7 21:57 Tue 5 23:00 Sat 2 05:57

This document has no rendered page for this history range.

java/resources/audio/subresources/transcriptions/index.md +0 −1076 deleted

File Deleted View Diff

~~1# Transcriptions~~

~~3## Create transcription~~

~~5`TranscriptionCreateResponse audio().transcriptions().create(TranscriptionCreateParamsparams, RequestOptionsrequestOptions = RequestOptions.none())`~~

~~7**post** `/audio/transcriptions`~~

~~9Transcribes audio into the input language.~~

~~11Returns a transcription object in `json`, `diarized_json`, or `verbose_json`~~

~~12format, or a stream of transcript events.~~

~~14### Parameters~~

~~16- `TranscriptionCreateParams params`~~

~~18 - `String file`~~

~~20 The audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.~~

~~22 - `AudioModel model`~~

24 ID of the model to use. The options are `gpt-4o-transcribe`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `whisper-1` (which is powered by our open source Whisper V2 model), and `gpt-4o-transcribe-diarize`.

~~26 - `WHISPER_1("whisper-1")`~~

~~28 - `GPT_4O_TRANSCRIBE("gpt-4o-transcribe")`~~

~~30 - `GPT_4O_MINI_TRANSCRIBE("gpt-4o-mini-transcribe")`~~

~~32 - `GPT_4O_MINI_TRANSCRIBE_2025_12_15("gpt-4o-mini-transcribe-2025-12-15")`~~

~~34 - `GPT_4O_TRANSCRIBE_DIARIZE("gpt-4o-transcribe-diarize")`~~

~~36 - `Optional<ChunkingStrategy> chunkingStrategy`~~

38 Controls how the audio is cut into chunks. When set to `"auto"`, the server first normalizes loudness and then uses voice activity detection (VAD) to choose boundaries. `server_vad` object can be provided to tweak VAD detection parameters manually. If unset, the audio is transcribed as a single block. Required when using `gpt-4o-transcribe-diarize` for inputs longer than 30 seconds.

~~40 - `JsonValue;`~~

~~42 - `AUTO("auto")`~~

~~44 - `class VadConfig:`~~

~~46 - `Type type`~~

~~48 Must be set to `server_vad` to enable manual chunking using server side VAD.~~

~~50 - `SERVER_VAD("server_vad")`~~

~~52 - `Optional<Long> prefixPaddingMs`~~

~~54 Amount of audio to include before the VAD detected speech (in~~

~~55 milliseconds).~~

~~57 - `Optional<Long> silenceDurationMs`~~

~~59 Duration of silence to detect speech stop (in milliseconds).~~

~~60 With shorter values the model will respond more quickly,~~

~~61 but may jump in on short pauses from the user.~~

~~63 - `Optional<Double> threshold`~~

~~65 Sensitivity threshold (0.0 to 1.0) for voice activity detection. A~~

~~66 higher threshold will require louder audio to activate the model, and~~

~~67 thus might perform better in noisy environments.~~

~~69 - `Optional<List<TranscriptionInclude>> include`~~

~~71 Additional information to include in the transcription response.~~

~~72 `logprobs` will return the log probabilities of the tokens in the~~

~~73 response to understand the model's confidence in the transcription.~~

~~74 `logprobs` only works with response_format set to `json` and only with~~

~~75 the models `gpt-4o-transcribe`, `gpt-4o-mini-transcribe`, and `gpt-4o-mini-transcribe-2025-12-15`. This field is not supported when using `gpt-4o-transcribe-diarize`.~~

~~77 - `LOGPROBS("logprobs")`~~

~~79 - `Optional<List<String>> knownSpeakerNames`~~

81 Optional list of speaker names that correspond to the audio samples provided in `known_speaker_references[]`. Each entry should be a short identifier (for example `customer` or `agent`). Up to 4 speakers are supported.

~~83 - `Optional<List<String>> knownSpeakerReferences`~~

85 Optional list of audio samples (as [data URLs](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs)) that contain known speaker references matching `known_speaker_names[]`. Each sample must be between 2 and 10 seconds, and can use any of the same input audio formats supported by `file`.

~~87 - `Optional<String> language`~~

~~89 The language of the input audio. Supplying the input language in [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (e.g. `en`) format will improve accuracy and latency.~~

~~91 - `Optional<String> prompt`~~

93 An optional text to guide the model's style or continue a previous audio segment. The [prompt](https://platform.openai.com/docs/guides/speech-to-text#prompting) should match the audio language. This field is not supported when using `gpt-4o-transcribe-diarize`.

~~95 - `Optional<AudioResponseFormat> responseFormat`~~

97 The format of the output, in one of these options: `json`, `text`, `srt`, `verbose_json`, `vtt`, or `diarized_json`. For `gpt-4o-transcribe` and `gpt-4o-mini-transcribe`, the only supported format is `json`. For `gpt-4o-transcribe-diarize`, the supported formats are `json`, `text`, and `diarized_json`, with `diarized_json` required to receive speaker annotations.

~~99 - `Optional<Double> temperature`~~

~~100~~

101 The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use [log probability](https://en.wikipedia.org/wiki/Log_probability) to automatically increase the temperature until certain thresholds are hit.

~~102~~

103 - `Optional<List<TimestampGranularity>> timestampGranularities`

~~104~~

105 The timestamp granularities to populate for this transcription. `response_format` must be set `verbose_json` to use timestamp granularities. Either or both of these options are supported: `word`, or `segment`. Note: There is no additional latency for segment timestamps, but generating word timestamps incurs additional latency.

106 This option is not available for `gpt-4o-transcribe-diarize`.

~~107~~

108 - `WORD("word")`

~~109~~

110 - `SEGMENT("segment")`

~~111~~

112### Returns

~~113~~

114- `class TranscriptionCreateResponse: A class that can be one of several variants.union`

~~115~~

116 Represents a transcription response returned by model, based on the provided input.

~~117~~

118 - `class Transcription:`

~~119~~

120 Represents a transcription response returned by model, based on the provided input.

~~121~~

122 - `String text`

~~123~~

124 The transcribed text.

~~125~~

126 - `Optional<List<Logprob>> logprobs`

~~127~~

128 The log probabilities of the tokens in the transcription. Only returned with the models `gpt-4o-transcribe` and `gpt-4o-mini-transcribe` if `logprobs` is added to the `include` array.

~~129~~

130 - `Optional<String> token`

~~131~~

132 The token in the transcription.

~~133~~

134 - `Optional<List<Double>> bytes`

~~135~~

136 The bytes of the token.

~~137~~

138 - `Optional<Double> logprob`

~~139~~

140 The log probability of the token.

~~141~~

142 - `Optional<Usage> usage`

~~143~~

144 Token usage statistics for the request.

~~145~~

146 - `class Tokens:`

~~147~~

148 Usage statistics for models billed by token usage.

~~149~~

150 - `long inputTokens`

~~151~~

152 Number of input tokens billed for this request.

~~153~~

154 - `long outputTokens`

~~155~~

156 Number of output tokens generated.

~~157~~

158 - `long totalTokens`

~~159~~

160 Total number of tokens used (input + output).

~~161~~

162 - `JsonValue; type "tokens"constant`

~~163~~

164 The type of the usage object. Always `tokens` for this variant.

~~165~~

166 - `TOKENS("tokens")`

~~167~~

168 - `Optional<InputTokenDetails> inputTokenDetails`

~~169~~

170 Details about the input tokens billed for this request.

~~171~~

172 - `Optional<Long> audioTokens`

~~173~~

174 Number of audio tokens billed for this request.

~~175~~

176 - `Optional<Long> textTokens`

~~177~~

178 Number of text tokens billed for this request.

~~179~~

180 - `class Duration:`

~~181~~

182 Usage statistics for models billed by audio input duration.

~~183~~

184 - `double seconds`

~~185~~

186 Duration of the input audio in seconds.

~~187~~

188 - `JsonValue; type "duration"constant`

~~189~~

190 The type of the usage object. Always `duration` for this variant.

~~191~~

192 - `DURATION("duration")`

~~193~~

194 - `class TranscriptionDiarized:`

~~195~~

196 Represents a diarized transcription response returned by the model, including the combined transcript and speaker-segment annotations.

~~197~~

198 - `double duration`

~~199~~

200 Duration of the input audio in seconds.

~~201~~

202 - `List<TranscriptionDiarizedSegment> segments`

~~203~~

204 Segments of the transcript annotated with timestamps and speaker labels.

~~205~~

206 - `String id`

~~207~~

208 Unique identifier for the segment.

~~209~~

210 - `double end`

~~211~~

212 End timestamp of the segment in seconds.

~~213~~

214 - `String speaker`

~~215~~

216 Speaker label for this segment. When known speakers are provided, the label matches `known_speaker_names[]`. Otherwise speakers are labeled sequentially using capital letters (`A`, `B`, ...).

~~217~~

218 - `double start`

~~219~~

220 Start timestamp of the segment in seconds.

~~221~~

222 - `String text`

~~223~~

224 Transcript text for this segment.

~~225~~

226 - `JsonValue; type "transcript.text.segment"constant`

~~227~~

228 The type of the segment. Always `transcript.text.segment`.

~~229~~

230 - `TRANSCRIPT_TEXT_SEGMENT("transcript.text.segment")`

~~231~~

232 - `JsonValue; task "transcribe"constant`

~~233~~

234 The type of task that was run. Always `transcribe`.

~~235~~

236 - `TRANSCRIBE("transcribe")`

~~237~~

238 - `String text`

~~239~~

240 The concatenated transcript text for the entire audio input.

~~241~~

242 - `Optional<Usage> usage`

~~243~~

244 Token or duration usage statistics for the request.

~~245~~

246 - `class Tokens:`

~~247~~

248 Usage statistics for models billed by token usage.

~~249~~

250 - `long inputTokens`

~~251~~

252 Number of input tokens billed for this request.

~~253~~

254 - `long outputTokens`

~~255~~

256 Number of output tokens generated.

~~257~~

258 - `long totalTokens`

~~259~~

260 Total number of tokens used (input + output).

~~261~~

262 - `JsonValue; type "tokens"constant`

~~263~~

264 The type of the usage object. Always `tokens` for this variant.

~~265~~

266 - `TOKENS("tokens")`

~~267~~

268 - `Optional<InputTokenDetails> inputTokenDetails`

~~269~~

270 Details about the input tokens billed for this request.

~~271~~

272 - `Optional<Long> audioTokens`

~~273~~

274 Number of audio tokens billed for this request.

~~275~~

276 - `Optional<Long> textTokens`

~~277~~

278 Number of text tokens billed for this request.

~~279~~

280 - `class Duration:`

~~281~~

282 Usage statistics for models billed by audio input duration.

~~283~~

284 - `double seconds`

~~285~~

286 Duration of the input audio in seconds.

~~287~~

288 - `JsonValue; type "duration"constant`

~~289~~

290 The type of the usage object. Always `duration` for this variant.

~~291~~

292 - `DURATION("duration")`

~~293~~

294 - `class TranscriptionVerbose:`

~~295~~

296 Represents a verbose json transcription response returned by model, based on the provided input.

~~297~~

298 - `double duration`

~~299~~

300 The duration of the input audio.

~~301~~

302 - `String language`

~~303~~

304 The language of the input audio.

~~305~~

306 - `String text`

~~307~~

308 The transcribed text.

~~309~~

310 - `Optional<List<TranscriptionSegment>> segments`

~~311~~

312 Segments of the transcribed text and their corresponding details.

~~313~~

314 - `long id`

~~315~~

316 Unique identifier of the segment.

~~317~~

318 - `double avgLogprob`

~~319~~

320 Average logprob of the segment. If the value is lower than -1, consider the logprobs failed.

~~321~~

322 - `double compressionRatio`

~~323~~

324 Compression ratio of the segment. If the value is greater than 2.4, consider the compression failed.

~~325~~

326 - `double end`

~~327~~

328 End time of the segment in seconds.

~~329~~

330 - `double noSpeechProb`

~~331~~

332 Probability of no speech in the segment. If the value is higher than 1.0 and the `avg_logprob` is below -1, consider this segment silent.

~~333~~

334 - `long seek`

~~335~~

336 Seek offset of the segment.

~~337~~

338 - `double start`

~~339~~

340 Start time of the segment in seconds.

~~341~~

342 - `double temperature`

~~343~~

344 Temperature parameter used for generating the segment.

~~345~~

346 - `String text`

~~347~~

348 Text content of the segment.

~~349~~

350 - `List<long> tokens`

~~351~~

352 Array of token IDs for the text content.

~~353~~

354 - `Optional<Usage> usage`

~~355~~

356 Usage statistics for models billed by audio input duration.

~~357~~

358 - `double seconds`

~~359~~

360 Duration of the input audio in seconds.

~~361~~

362 - `JsonValue; type "duration"constant`

~~363~~

364 The type of the usage object. Always `duration` for this variant.

~~365~~

366 - `DURATION("duration")`

~~367~~

368 - `Optional<List<TranscriptionWord>> words`

~~369~~

370 Extracted words and their corresponding timestamps.

~~371~~

372 - `double end`

~~373~~

374 End time of the word in seconds.

~~375~~

376 - `double start`

~~377~~

378 Start time of the word in seconds.

~~379~~

380 - `String word`

~~381~~

382 The text content of the word.

~~383~~

384### Example

~~385~~

386```java

387package com.openai.example;

~~388~~

389import com.openai.client.OpenAIClient;

390import com.openai.client.okhttp.OpenAIOkHttpClient;

391import com.openai.models.audio.AudioModel;

392import com.openai.models.audio.transcriptions.TranscriptionCreateParams;

393import com.openai.models.audio.transcriptions.TranscriptionCreateResponse;

394import java.io.ByteArrayInputStream;

~~395~~

396public final class Main {

397 private Main() {}

~~398~~

399 public static void main(String[] args) {

400 OpenAIClient client = OpenAIOkHttpClient.fromEnv();

~~401~~

402 TranscriptionCreateParams params = TranscriptionCreateParams.builder()

403 .file(new ByteArrayInputStream("Example data".getBytes()))

404 .model(AudioModel.GPT_4O_TRANSCRIBE)

405 .build();

406 TranscriptionCreateResponse transcription = client.audio().transcriptions().create(params);

407 }

408}

409```

~~410~~

411#### Response

~~412~~

413```json

414{

415 "text": "text",

416 "logprobs": [

417 {

418 "token": "token",

419 "bytes": [

420 0

421 ],

422 "logprob": 0

423 }

424 ],

425 "usage": {

426 "input_tokens": 0,

427 "output_tokens": 0,

428 "total_tokens": 0,

429 "type": "tokens",

430 "input_token_details": {

431 "audio_tokens": 0,

432 "text_tokens": 0

433 }

434 }

435}

436```

~~437~~

438## Domain Types

~~439~~

440### Transcription

~~441~~

442- `class Transcription:`

~~443~~

444 Represents a transcription response returned by model, based on the provided input.

~~445~~

446 - `String text`

~~447~~

448 The transcribed text.

~~449~~

450 - `Optional<List<Logprob>> logprobs`

~~451~~

452 The log probabilities of the tokens in the transcription. Only returned with the models `gpt-4o-transcribe` and `gpt-4o-mini-transcribe` if `logprobs` is added to the `include` array.

~~453~~

454 - `Optional<String> token`

~~455~~

456 The token in the transcription.

~~457~~

458 - `Optional<List<Double>> bytes`

~~459~~

460 The bytes of the token.

~~461~~

462 - `Optional<Double> logprob`

~~463~~

464 The log probability of the token.

~~465~~

466 - `Optional<Usage> usage`

~~467~~

468 Token usage statistics for the request.

~~469~~

470 - `class Tokens:`

~~471~~

472 Usage statistics for models billed by token usage.

~~473~~

474 - `long inputTokens`

~~475~~

476 Number of input tokens billed for this request.

~~477~~

478 - `long outputTokens`

~~479~~

480 Number of output tokens generated.

~~481~~

482 - `long totalTokens`

~~483~~

484 Total number of tokens used (input + output).

~~485~~

486 - `JsonValue; type "tokens"constant`

~~487~~

488 The type of the usage object. Always `tokens` for this variant.

~~489~~

490 - `TOKENS("tokens")`

~~491~~

492 - `Optional<InputTokenDetails> inputTokenDetails`

~~493~~

494 Details about the input tokens billed for this request.

~~495~~

496 - `Optional<Long> audioTokens`

~~497~~

498 Number of audio tokens billed for this request.

~~499~~

500 - `Optional<Long> textTokens`

~~501~~

502 Number of text tokens billed for this request.

~~503~~

504 - `class Duration:`

~~505~~

506 Usage statistics for models billed by audio input duration.

~~507~~

508 - `double seconds`

~~509~~

510 Duration of the input audio in seconds.

~~511~~

512 - `JsonValue; type "duration"constant`

~~513~~

514 The type of the usage object. Always `duration` for this variant.

~~515~~

516 - `DURATION("duration")`

~~517~~

518### Transcription Diarized

~~519~~

520- `class TranscriptionDiarized:`

~~521~~

522 Represents a diarized transcription response returned by the model, including the combined transcript and speaker-segment annotations.

~~523~~

524 - `double duration`

~~525~~

526 Duration of the input audio in seconds.

~~527~~

528 - `List<TranscriptionDiarizedSegment> segments`

~~529~~

530 Segments of the transcript annotated with timestamps and speaker labels.

~~531~~

532 - `String id`

~~533~~

534 Unique identifier for the segment.

~~535~~

536 - `double end`

~~537~~

538 End timestamp of the segment in seconds.

~~539~~

540 - `String speaker`

~~541~~

542 Speaker label for this segment. When known speakers are provided, the label matches `known_speaker_names[]`. Otherwise speakers are labeled sequentially using capital letters (`A`, `B`, ...).

~~543~~

544 - `double start`

~~545~~

546 Start timestamp of the segment in seconds.

~~547~~

548 - `String text`

~~549~~

550 Transcript text for this segment.

~~551~~

552 - `JsonValue; type "transcript.text.segment"constant`

~~553~~

554 The type of the segment. Always `transcript.text.segment`.

~~555~~

556 - `TRANSCRIPT_TEXT_SEGMENT("transcript.text.segment")`

~~557~~

558 - `JsonValue; task "transcribe"constant`

~~559~~

560 The type of task that was run. Always `transcribe`.

~~561~~

562 - `TRANSCRIBE("transcribe")`

~~563~~

564 - `String text`

~~565~~

566 The concatenated transcript text for the entire audio input.

~~567~~

568 - `Optional<Usage> usage`

~~569~~

570 Token or duration usage statistics for the request.

~~571~~

572 - `class Tokens:`

~~573~~

574 Usage statistics for models billed by token usage.

~~575~~

576 - `long inputTokens`

~~577~~

578 Number of input tokens billed for this request.

~~579~~

580 - `long outputTokens`

~~581~~

582 Number of output tokens generated.

~~583~~

584 - `long totalTokens`

~~585~~

586 Total number of tokens used (input + output).

~~587~~

588 - `JsonValue; type "tokens"constant`

~~589~~

590 The type of the usage object. Always `tokens` for this variant.

~~591~~

592 - `TOKENS("tokens")`

~~593~~

594 - `Optional<InputTokenDetails> inputTokenDetails`

~~595~~

596 Details about the input tokens billed for this request.

~~597~~

598 - `Optional<Long> audioTokens`

~~599~~

600 Number of audio tokens billed for this request.

~~601~~

602 - `Optional<Long> textTokens`

~~603~~

604 Number of text tokens billed for this request.

~~605~~

606 - `class Duration:`

~~607~~

608 Usage statistics for models billed by audio input duration.

~~609~~

610 - `double seconds`

~~611~~

612 Duration of the input audio in seconds.

~~613~~

614 - `JsonValue; type "duration"constant`

~~615~~

616 The type of the usage object. Always `duration` for this variant.

~~617~~

618 - `DURATION("duration")`

~~619~~

620### Transcription Diarized Segment

~~621~~

622- `class TranscriptionDiarizedSegment:`

~~623~~

624 A segment of diarized transcript text with speaker metadata.

~~625~~

626 - `String id`

~~627~~

628 Unique identifier for the segment.

~~629~~

630 - `double end`

~~631~~

632 End timestamp of the segment in seconds.

~~633~~

634 - `String speaker`

~~635~~

636 Speaker label for this segment. When known speakers are provided, the label matches `known_speaker_names[]`. Otherwise speakers are labeled sequentially using capital letters (`A`, `B`, ...).

~~637~~

638 - `double start`

~~639~~

640 Start timestamp of the segment in seconds.

~~641~~

642 - `String text`

~~643~~

644 Transcript text for this segment.

~~645~~

646 - `JsonValue; type "transcript.text.segment"constant`

~~647~~

648 The type of the segment. Always `transcript.text.segment`.

~~649~~

650 - `TRANSCRIPT_TEXT_SEGMENT("transcript.text.segment")`

~~651~~

652### Transcription Include

~~653~~

654- `enum TranscriptionInclude:`

~~655~~

656 - `LOGPROBS("logprobs")`

~~657~~

658### Transcription Segment

~~659~~

660- `class TranscriptionSegment:`

~~661~~

662 - `long id`

~~663~~

664 Unique identifier of the segment.

~~665~~

666 - `double avgLogprob`

~~667~~

668 Average logprob of the segment. If the value is lower than -1, consider the logprobs failed.

~~669~~

670 - `double compressionRatio`

~~671~~

672 Compression ratio of the segment. If the value is greater than 2.4, consider the compression failed.

~~673~~

674 - `double end`

~~675~~

676 End time of the segment in seconds.

~~677~~

678 - `double noSpeechProb`

~~679~~

680 Probability of no speech in the segment. If the value is higher than 1.0 and the `avg_logprob` is below -1, consider this segment silent.

~~681~~

682 - `long seek`

~~683~~

684 Seek offset of the segment.

~~685~~

686 - `double start`

~~687~~

688 Start time of the segment in seconds.

~~689~~

690 - `double temperature`

~~691~~

692 Temperature parameter used for generating the segment.

~~693~~

694 - `String text`

~~695~~

696 Text content of the segment.

~~697~~

698 - `List<long> tokens`

~~699~~

700 Array of token IDs for the text content.

~~701~~

702### Transcription Stream Event

~~703~~

704- `class TranscriptionStreamEvent: A class that can be one of several variants.union`

~~705~~

706 Emitted when a diarized transcription returns a completed segment with speaker information. Only emitted when you [create a transcription](https://platform.openai.com/docs/api-reference/audio/create-transcription) with `stream` set to `true` and `response_format` set to `diarized_json`.

~~707~~

708 - `class TranscriptionTextSegmentEvent:`

~~709~~

710 Emitted when a diarized transcription returns a completed segment with speaker information. Only emitted when you [create a transcription](https://platform.openai.com/docs/api-reference/audio/create-transcription) with `stream` set to `true` and `response_format` set to `diarized_json`.

~~711~~

712 - `String id`

~~713~~

714 Unique identifier for the segment.

~~715~~

716 - `double end`

~~717~~

718 End timestamp of the segment in seconds.

~~719~~

720 - `String speaker`

~~721~~

722 Speaker label for this segment.

~~723~~

724 - `double start`

~~725~~

726 Start timestamp of the segment in seconds.

~~727~~

728 - `String text`

~~729~~

730 Transcript text for this segment.

~~731~~

732 - `JsonValue; type "transcript.text.segment"constant`

~~733~~

734 The type of the event. Always `transcript.text.segment`.

~~735~~

736 - `TRANSCRIPT_TEXT_SEGMENT("transcript.text.segment")`

~~737~~

738 - `class TranscriptionTextDeltaEvent:`

~~739~~

740 Emitted when there is an additional text delta. This is also the first event emitted when the transcription starts. Only emitted when you [create a transcription](https://platform.openai.com/docs/api-reference/audio/create-transcription) with the `Stream` parameter set to `true`.

~~741~~

742 - `String delta`

~~743~~

744 The text delta that was additionally transcribed.

~~745~~

746 - `JsonValue; type "transcript.text.delta"constant`

~~747~~

748 The type of the event. Always `transcript.text.delta`.

~~749~~

750 - `TRANSCRIPT_TEXT_DELTA("transcript.text.delta")`

~~751~~

752 - `Optional<List<Logprob>> logprobs`

~~753~~

754 The log probabilities of the delta. Only included if you [create a transcription](https://platform.openai.com/docs/api-reference/audio/create-transcription) with the `include[]` parameter set to `logprobs`.

~~755~~

756 - `Optional<String> token`

~~757~~

758 The token that was used to generate the log probability.

~~759~~

760 - `Optional<List<Long>> bytes`

~~761~~

762 The bytes that were used to generate the log probability.

~~763~~

764 - `Optional<Double> logprob`

~~765~~

766 The log probability of the token.

~~767~~

768 - `Optional<String> segmentId`

~~769~~

770 Identifier of the diarized segment that this delta belongs to. Only present when using `gpt-4o-transcribe-diarize`.

~~771~~

772 - `class TranscriptionTextDoneEvent:`

~~773~~

774 Emitted when the transcription is complete. Contains the complete transcription text. Only emitted when you [create a transcription](https://platform.openai.com/docs/api-reference/audio/create-transcription) with the `Stream` parameter set to `true`.

~~775~~

776 - `String text`

~~777~~

778 The text that was transcribed.

~~779~~

780 - `JsonValue; type "transcript.text.done"constant`

~~781~~

782 The type of the event. Always `transcript.text.done`.

~~783~~

784 - `TRANSCRIPT_TEXT_DONE("transcript.text.done")`

~~785~~

786 - `Optional<List<Logprob>> logprobs`

~~787~~

788 The log probabilities of the individual tokens in the transcription. Only included if you [create a transcription](https://platform.openai.com/docs/api-reference/audio/create-transcription) with the `include[]` parameter set to `logprobs`.

~~789~~

790 - `Optional<String> token`

~~791~~

792 The token that was used to generate the log probability.

~~793~~

794 - `Optional<List<Long>> bytes`

~~795~~

796 The bytes that were used to generate the log probability.

~~797~~

798 - `Optional<Double> logprob`

~~799~~

800 The log probability of the token.

~~801~~

802 - `Optional<Usage> usage`

~~803~~

804 Usage statistics for models billed by token usage.

~~805~~

806 - `long inputTokens`

~~807~~

808 Number of input tokens billed for this request.

~~809~~

810 - `long outputTokens`

~~811~~

812 Number of output tokens generated.

~~813~~

814 - `long totalTokens`

~~815~~

816 Total number of tokens used (input + output).

~~817~~

818 - `JsonValue; type "tokens"constant`

~~819~~

820 The type of the usage object. Always `tokens` for this variant.

~~821~~

822 - `TOKENS("tokens")`

~~823~~

824 - `Optional<InputTokenDetails> inputTokenDetails`

~~825~~

826 Details about the input tokens billed for this request.

~~827~~

828 - `Optional<Long> audioTokens`

~~829~~

830 Number of audio tokens billed for this request.

~~831~~

832 - `Optional<Long> textTokens`

~~833~~

834 Number of text tokens billed for this request.

~~835~~

836### Transcription Text Delta Event

~~837~~

838- `class TranscriptionTextDeltaEvent:`

~~839~~

840 Emitted when there is an additional text delta. This is also the first event emitted when the transcription starts. Only emitted when you [create a transcription](https://platform.openai.com/docs/api-reference/audio/create-transcription) with the `Stream` parameter set to `true`.

~~841~~

842 - `String delta`

~~843~~

844 The text delta that was additionally transcribed.

~~845~~

846 - `JsonValue; type "transcript.text.delta"constant`

~~847~~

848 The type of the event. Always `transcript.text.delta`.

~~849~~

850 - `TRANSCRIPT_TEXT_DELTA("transcript.text.delta")`

~~851~~

852 - `Optional<List<Logprob>> logprobs`

~~853~~

854 The log probabilities of the delta. Only included if you [create a transcription](https://platform.openai.com/docs/api-reference/audio/create-transcription) with the `include[]` parameter set to `logprobs`.

~~855~~

856 - `Optional<String> token`

~~857~~

858 The token that was used to generate the log probability.

~~859~~

860 - `Optional<List<Long>> bytes`

~~861~~

862 The bytes that were used to generate the log probability.

~~863~~

864 - `Optional<Double> logprob`

~~865~~

866 The log probability of the token.

~~867~~

868 - `Optional<String> segmentId`

~~869~~

870 Identifier of the diarized segment that this delta belongs to. Only present when using `gpt-4o-transcribe-diarize`.

~~871~~

872### Transcription Text Done Event

~~873~~

874- `class TranscriptionTextDoneEvent:`

~~875~~

876 Emitted when the transcription is complete. Contains the complete transcription text. Only emitted when you [create a transcription](https://platform.openai.com/docs/api-reference/audio/create-transcription) with the `Stream` parameter set to `true`.

~~877~~

878 - `String text`

~~879~~

880 The text that was transcribed.

~~881~~

882 - `JsonValue; type "transcript.text.done"constant`

~~883~~

884 The type of the event. Always `transcript.text.done`.

~~885~~

886 - `TRANSCRIPT_TEXT_DONE("transcript.text.done")`

~~887~~

888 - `Optional<List<Logprob>> logprobs`

~~889~~

890 The log probabilities of the individual tokens in the transcription. Only included if you [create a transcription](https://platform.openai.com/docs/api-reference/audio/create-transcription) with the `include[]` parameter set to `logprobs`.

~~891~~

892 - `Optional<String> token`

~~893~~

894 The token that was used to generate the log probability.

~~895~~

896 - `Optional<List<Long>> bytes`

~~897~~

898 The bytes that were used to generate the log probability.

~~899~~

900 - `Optional<Double> logprob`

~~901~~

902 The log probability of the token.

~~903~~

904 - `Optional<Usage> usage`

~~905~~

906 Usage statistics for models billed by token usage.

~~907~~

908 - `long inputTokens`

~~909~~

910 Number of input tokens billed for this request.

~~911~~

912 - `long outputTokens`

~~913~~

914 Number of output tokens generated.

~~915~~

916 - `long totalTokens`

~~917~~

918 Total number of tokens used (input + output).

~~919~~

920 - `JsonValue; type "tokens"constant`

~~921~~

922 The type of the usage object. Always `tokens` for this variant.

~~923~~

924 - `TOKENS("tokens")`

~~925~~

926 - `Optional<InputTokenDetails> inputTokenDetails`

~~927~~

928 Details about the input tokens billed for this request.

~~929~~

930 - `Optional<Long> audioTokens`

~~931~~

932 Number of audio tokens billed for this request.

~~933~~

934 - `Optional<Long> textTokens`

~~935~~

936 Number of text tokens billed for this request.

~~937~~

938### Transcription Text Segment Event

~~939~~

940- `class TranscriptionTextSegmentEvent:`

~~941~~

942 Emitted when a diarized transcription returns a completed segment with speaker information. Only emitted when you [create a transcription](https://platform.openai.com/docs/api-reference/audio/create-transcription) with `stream` set to `true` and `response_format` set to `diarized_json`.

~~943~~

944 - `String id`

~~945~~

946 Unique identifier for the segment.

~~947~~

948 - `double end`

~~949~~

950 End timestamp of the segment in seconds.

~~951~~

952 - `String speaker`

~~953~~

954 Speaker label for this segment.

~~955~~

956 - `double start`

~~957~~

958 Start timestamp of the segment in seconds.

~~959~~

960 - `String text`

~~961~~

962 Transcript text for this segment.

~~963~~

964 - `JsonValue; type "transcript.text.segment"constant`

~~965~~

966 The type of the event. Always `transcript.text.segment`.

~~967~~

968 - `TRANSCRIPT_TEXT_SEGMENT("transcript.text.segment")`

~~969~~

970### Transcription Verbose

~~971~~

972- `class TranscriptionVerbose:`

~~973~~

974 Represents a verbose json transcription response returned by model, based on the provided input.

~~975~~

976 - `double duration`

~~977~~

978 The duration of the input audio.

~~979~~

980 - `String language`

~~981~~

982 The language of the input audio.

~~983~~

984 - `String text`

~~985~~

986 The transcribed text.

~~987~~

988 - `Optional<List<TranscriptionSegment>> segments`

~~989~~

990 Segments of the transcribed text and their corresponding details.

~~991~~

992 - `long id`

~~993~~

994 Unique identifier of the segment.

~~995~~

996 - `double avgLogprob`

~~997~~

998 Average logprob of the segment. If the value is lower than -1, consider the logprobs failed.

~~999~~

1000 - `double compressionRatio`

~~1001~~

1002 Compression ratio of the segment. If the value is greater than 2.4, consider the compression failed.

~~1003~~

1004 - `double end`

~~1005~~

1006 End time of the segment in seconds.

~~1007~~

1008 - `double noSpeechProb`

~~1009~~

1010 Probability of no speech in the segment. If the value is higher than 1.0 and the `avg_logprob` is below -1, consider this segment silent.

~~1011~~

1012 - `long seek`

~~1013~~

1014 Seek offset of the segment.

~~1015~~

1016 - `double start`

~~1017~~

1018 Start time of the segment in seconds.

~~1019~~

1020 - `double temperature`

~~1021~~

1022 Temperature parameter used for generating the segment.

~~1023~~

1024 - `String text`

~~1025~~

1026 Text content of the segment.

~~1027~~

1028 - `List<long> tokens`

~~1029~~

1030 Array of token IDs for the text content.

~~1031~~

1032 - `Optional<Usage> usage`

~~1033~~

1034 Usage statistics for models billed by audio input duration.

~~1035~~

1036 - `double seconds`

~~1037~~

1038 Duration of the input audio in seconds.

~~1039~~

1040 - `JsonValue; type "duration"constant`

~~1041~~

1042 The type of the usage object. Always `duration` for this variant.

~~1043~~

1044 - `DURATION("duration")`

~~1045~~

1046 - `Optional<List<TranscriptionWord>> words`

~~1047~~

1048 Extracted words and their corresponding timestamps.

~~1049~~

1050 - `double end`

~~1051~~

1052 End time of the word in seconds.

~~1053~~

1054 - `double start`

~~1055~~

1056 Start time of the word in seconds.

~~1057~~

1058 - `String word`

~~1059~~

1060 The text content of the word.

~~1061~~

1062### Transcription Word

~~1063~~

1064- `class TranscriptionWord:`

~~1065~~

1066 - `double end`

~~1067~~

1068 End time of the word in seconds.

~~1069~~

1070 - `double start`

~~1071~~

1072 Start time of the word in seconds.

~~1073~~

1074 - `String word`

~~1075~~

1076 The text content of the word.