cli/resources/fine_tuning/subresources/alpha/index.md +0 −611 deleted
File Deleted View Diff
1# Alpha
2
3# Graders
4
5## Run grader
6
7`$ openai fine-tuning:alpha:graders run`
8
9**post** `/fine_tuning/alpha/graders/run`
10
11Run a grader.
12
13### Parameters
14
15- `--grader: StringCheckGrader or TextSimilarityGrader or PythonGrader or 2 more`
16
17 The grader used for the fine-tuning job.
18
19- `--model-sample: string`
20
21 The model sample to be evaluated. This value will be used to populate
22 the `sample` namespace. See [the guide](https://platform.openai.com/docs/guides/graders) for more details.
23 The `output_json` variable will be populated if the model sample is a
24 valid JSON string.
25
26- `--item: optional unknown`
27
28 The dataset item provided to the grader. This will be used to populate
29 the `item` namespace. See [the guide](https://platform.openai.com/docs/guides/graders) for more details.
30
31### Returns
32
33- `FineTuningAlphaGraderRunResponse: object { metadata, model_grader_token_usage_per_model, reward, sub_rewards }`
34
35 - `metadata: object { errors, execution_time, name, 4 more }`
36
37 - `errors: object { formula_parse_error, invalid_variable_error, model_grader_parse_error, 11 more }`
38
39 - `formula_parse_error: boolean`
40
41 - `invalid_variable_error: boolean`
42
43 - `model_grader_parse_error: boolean`
44
45 - `model_grader_refusal_error: boolean`
46
47 - `model_grader_server_error: boolean`
48
49 - `model_grader_server_error_details: string`
50
51 - `other_error: boolean`
52
53 - `python_grader_runtime_error: boolean`
54
55 - `python_grader_runtime_error_details: string`
56
57 - `python_grader_server_error: boolean`
58
59 - `python_grader_server_error_type: string`
60
61 - `sample_parse_error: boolean`
62
63 - `truncated_observation_error: boolean`
64
65 - `unresponsive_reward_error: boolean`
66
67 - `execution_time: number`
68
69 - `name: string`
70
71 - `sampled_model_name: string`
72
73 - `scores: map[unknown]`
74
75 - `token_usage: number`
76
77 - `type: string`
78
79 - `model_grader_token_usage_per_model: map[unknown]`
80
81 - `reward: number`
82
83 - `sub_rewards: map[unknown]`
84
85### Example
86
87```cli
88openai fine-tuning:alpha:graders run \
89 --api-key 'My API Key' \
90 --grader '{input: input, name: name, operation: eq, reference: reference, type: string_check}' \
91 --model-sample model_sample
92```
93
94#### Response
95
96```json
97{
98 "metadata": {
99 "errors": {
100 "formula_parse_error": true,
101 "invalid_variable_error": true,
102 "model_grader_parse_error": true,
103 "model_grader_refusal_error": true,
104 "model_grader_server_error": true,
105 "model_grader_server_error_details": "model_grader_server_error_details",
106 "other_error": true,
107 "python_grader_runtime_error": true,
108 "python_grader_runtime_error_details": "python_grader_runtime_error_details",
109 "python_grader_server_error": true,
110 "python_grader_server_error_type": "python_grader_server_error_type",
111 "sample_parse_error": true,
112 "truncated_observation_error": true,
113 "unresponsive_reward_error": true
114 },
115 "execution_time": 0,
116 "name": "name",
117 "sampled_model_name": "sampled_model_name",
118 "scores": {
119 "foo": "bar"
120 },
121 "token_usage": 0,
122 "type": "type"
123 },
124 "model_grader_token_usage_per_model": {
125 "foo": "bar"
126 },
127 "reward": 0,
128 "sub_rewards": {
129 "foo": "bar"
130 }
131}
132```
133
134## Validate grader
135
136`$ openai fine-tuning:alpha:graders validate`
137
138**post** `/fine_tuning/alpha/graders/validate`
139
140Validate a grader.
141
142### Parameters
143
144- `--grader: StringCheckGrader or TextSimilarityGrader or PythonGrader or 2 more`
145
146 The grader used for the fine-tuning job.
147
148### Returns
149
150- `FineTuningAlphaGraderValidateResponse: object { grader }`
151
152 - `grader: optional StringCheckGrader or TextSimilarityGrader or PythonGrader or 2 more`
153
154 The grader used for the fine-tuning job.
155
156 - `string_check_grader: object { input, name, operation, 2 more }`
157
158 A StringCheckGrader object that performs a string comparison between input and reference using a specified operation.
159
160 - `input: string`
161
162 The input text. This may include template strings.
163
164 - `name: string`
165
166 The name of the grader.
167
168 - `operation: "eq" or "ne" or "like" or "ilike"`
169
170 The string check operation to perform. One of `eq`, `ne`, `like`, or `ilike`.
171
172 - `"eq"`
173
174 - `"ne"`
175
176 - `"like"`
177
178 - `"ilike"`
179
180 - `reference: string`
181
182 The reference text. This may include template strings.
183
184 - `type: "string_check"`
185
186 The object type, which is always `string_check`.
187
188 - `text_similarity_grader: object { evaluation_metric, input, name, 2 more }`
189
190 A TextSimilarityGrader object which grades text based on similarity metrics.
191
192 - `evaluation_metric: "cosine" or "fuzzy_match" or "bleu" or 8 more`
193
194 The evaluation metric to use. One of `cosine`, `fuzzy_match`, `bleu`,
195 `gleu`, `meteor`, `rouge_1`, `rouge_2`, `rouge_3`, `rouge_4`, `rouge_5`,
196 or `rouge_l`.
197
198 - `"cosine"`
199
200 - `"fuzzy_match"`
201
202 - `"bleu"`
203
204 - `"gleu"`
205
206 - `"meteor"`
207
208 - `"rouge_1"`
209
210 - `"rouge_2"`
211
212 - `"rouge_3"`
213
214 - `"rouge_4"`
215
216 - `"rouge_5"`
217
218 - `"rouge_l"`
219
220 - `input: string`
221
222 The text being graded.
223
224 - `name: string`
225
226 The name of the grader.
227
228 - `reference: string`
229
230 The text being graded against.
231
232 - `type: "text_similarity"`
233
234 The type of grader.
235
236 - `python_grader: object { name, source, type, image_tag }`
237
238 A PythonGrader object that runs a python script on the input.
239
240 - `name: string`
241
242 The name of the grader.
243
244 - `source: string`
245
246 The source code of the python script.
247
248 - `type: "python"`
249
250 The object type, which is always `python`.
251
252 - `image_tag: optional string`
253
254 The image tag to use for the python script.
255
256 - `score_model_grader: object { input, model, name, 3 more }`
257
258 A ScoreModelGrader object that uses a model to assign a score to the input.
259
260 - `input: array of object { content, role, type }`
261
262 The input messages evaluated by the grader. Supports text, output text, input image, and input audio content blocks, and may include template strings.
263
264 - `content: string or ResponseInputText or object { text, type } or 3 more`
265
266 Inputs to the model - can contain template strings. Supports text, output text, input images, and input audio, either as a single item or an array of items.
267
268 - `Text input: string`
269
270 A text input to the model.
271
272 - `response_input_text: object { text, type }`
273
274 A text input to the model.
275
276 - `text: string`
277
278 The text input to the model.
279
280 - `type: "input_text"`
281
282 The type of the input item. Always `input_text`.
283
284 - `Output text: object { text, type }`
285
286 A text output from the model.
287
288 - `text: string`
289
290 The text output from the model.
291
292 - `type: "output_text"`
293
294 The type of the output text. Always `output_text`.
295
296 - `Input image: object { image_url, type, detail }`
297
298 An image input block used within EvalItem content arrays.
299
300 - `image_url: string`
301
302 The URL of the image input.
303
304 - `type: "input_image"`
305
306 The type of the image input. Always `input_image`.
307
308 - `detail: optional string`
309
310 The detail level of the image to be sent to the model. One of `high`, `low`, or `auto`. Defaults to `auto`.
311
312 - `response_input_audio: object { input_audio, type }`
313
314 An audio input to the model.
315
316 - `input_audio: object { data, format }`
317
318 - `data: string`
319
320 Base64-encoded audio data.
321
322 - `format: "mp3" or "wav"`
323
324 The format of the audio data. Currently supported formats are `mp3` and
325 `wav`.
326
327 - `"mp3"`
328
329 - `"wav"`
330
331 - `type: "input_audio"`
332
333 The type of the input item. Always `input_audio`.
334
335 - `grader_inputs: array of string or ResponseInputText or object { text, type } or 2 more`
336
337 A list of inputs, each of which may be either an input text, output text, input
338 image, or input audio object.
339
340 - `Text input: string`
341
342 A text input to the model.
343
344 - `response_input_text: object { text, type }`
345
346 A text input to the model.
347
348 - `Output text: object { text, type }`
349
350 A text output from the model.
351
352 - `text: string`
353
354 The text output from the model.
355
356 - `type: "output_text"`
357
358 The type of the output text. Always `output_text`.
359
360 - `Input image: object { image_url, type, detail }`
361
362 An image input block used within EvalItem content arrays.
363
364 - `image_url: string`
365
366 The URL of the image input.
367
368 - `type: "input_image"`
369
370 The type of the image input. Always `input_image`.
371
372 - `detail: optional string`
373
374 The detail level of the image to be sent to the model. One of `high`, `low`, or `auto`. Defaults to `auto`.
375
376 - `response_input_audio: object { input_audio, type }`
377
378 An audio input to the model.
379
380 - `role: "user" or "assistant" or "system" or "developer"`
381
382 The role of the message input. One of `user`, `assistant`, `system`, or
383 `developer`.
384
385 - `"user"`
386
387 - `"assistant"`
388
389 - `"system"`
390
391 - `"developer"`
392
393 - `type: optional "message"`
394
395 The type of the message input. Always `message`.
396
397 - `"message"`
398
399 - `model: string`
400
401 The model to use for the evaluation.
402
403 - `name: string`
404
405 The name of the grader.
406
407 - `type: "score_model"`
408
409 The object type, which is always `score_model`.
410
411 - `range: optional array of number`
412
413 The range of the score. Defaults to `[0, 1]`.
414
415 - `sampling_params: optional object { max_completions_tokens, reasoning_effort, seed, 2 more }`
416
417 The sampling parameters for the model.
418
419 - `max_completions_tokens: optional number`
420
421 The maximum number of tokens the grader model may generate in its response.
422
423 - `reasoning_effort: optional "none" or "minimal" or "low" or 3 more`
424
425 Constrains effort on reasoning for
426 [reasoning models](https://platform.openai.com/docs/guides/reasoning).
427 Currently supported values are `none`, `minimal`, `low`, `medium`, `high`, and `xhigh`. Reducing
428 reasoning effort can result in faster responses and fewer tokens used
429 on reasoning in a response.
430
431 - `gpt-5.1` defaults to `none`, which does not perform reasoning. The supported reasoning values for `gpt-5.1` are `none`, `low`, `medium`, and `high`. Tool calls are supported for all reasoning values in gpt-5.1.
432 - All models before `gpt-5.1` default to `medium` reasoning effort, and do not support `none`.
433 - The `gpt-5-pro` model defaults to (and only supports) `high` reasoning effort.
434 - `xhigh` is supported for all models after `gpt-5.1-codex-max`.
435
436 - `"none"`
437
438 - `"minimal"`
439
440 - `"low"`
441
442 - `"medium"`
443
444 - `"high"`
445
446 - `"xhigh"`
447
448 - `seed: optional number`
449
450 A seed value to initialize the randomness, during sampling.
451
452 - `temperature: optional number`
453
454 A higher temperature increases randomness in the outputs.
455
456 - `top_p: optional number`
457
458 An alternative to temperature for nucleus sampling; 1.0 includes all tokens.
459
460 - `multi_grader: object { calculate_output, graders, name, type }`
461
462 A MultiGrader object combines the output of multiple graders to produce a single score.
463
464 - `calculate_output: string`
465
466 A formula to calculate the output based on grader results.
467
468 - `graders: StringCheckGrader or TextSimilarityGrader or PythonGrader or 2 more`
469
470 A StringCheckGrader object that performs a string comparison between input and reference using a specified operation.
471
472 - `string_check_grader: object { input, name, operation, 2 more }`
473
474 A StringCheckGrader object that performs a string comparison between input and reference using a specified operation.
475
476 - `text_similarity_grader: object { evaluation_metric, input, name, 2 more }`
477
478 A TextSimilarityGrader object which grades text based on similarity metrics.
479
480 - `python_grader: object { name, source, type, image_tag }`
481
482 A PythonGrader object that runs a python script on the input.
483
484 - `score_model_grader: object { input, model, name, 3 more }`
485
486 A ScoreModelGrader object that uses a model to assign a score to the input.
487
488 - `label_model_grader: object { input, labels, model, 3 more }`
489
490 A LabelModelGrader object which uses a model to assign labels to each item
491 in the evaluation.
492
493 - `input: array of object { content, role, type }`
494
495 - `content: string or ResponseInputText or object { text, type } or 3 more`
496
497 Inputs to the model - can contain template strings. Supports text, output text, input images, and input audio, either as a single item or an array of items.
498
499 - `Text input: string`
500
501 A text input to the model.
502
503 - `response_input_text: object { text, type }`
504
505 A text input to the model.
506
507 - `Output text: object { text, type }`
508
509 A text output from the model.
510
511 - `text: string`
512
513 The text output from the model.
514
515 - `type: "output_text"`
516
517 The type of the output text. Always `output_text`.
518
519 - `Input image: object { image_url, type, detail }`
520
521 An image input block used within EvalItem content arrays.
522
523 - `image_url: string`
524
525 The URL of the image input.
526
527 - `type: "input_image"`
528
529 The type of the image input. Always `input_image`.
530
531 - `detail: optional string`
532
533 The detail level of the image to be sent to the model. One of `high`, `low`, or `auto`. Defaults to `auto`.
534
535 - `response_input_audio: object { input_audio, type }`
536
537 An audio input to the model.
538
539 - `grader_inputs: array of string or ResponseInputText or object { text, type } or 2 more`
540
541 A list of inputs, each of which may be either an input text, output text, input
542 image, or input audio object.
543
544 - `role: "user" or "assistant" or "system" or "developer"`
545
546 The role of the message input. One of `user`, `assistant`, `system`, or
547 `developer`.
548
549 - `"user"`
550
551 - `"assistant"`
552
553 - `"system"`
554
555 - `"developer"`
556
557 - `type: optional "message"`
558
559 The type of the message input. Always `message`.
560
561 - `"message"`
562
563 - `labels: array of string`
564
565 The labels to assign to each item in the evaluation.
566
567 - `model: string`
568
569 The model to use for the evaluation. Must support structured outputs.
570
571 - `name: string`
572
573 The name of the grader.
574
575 - `passing_labels: array of string`
576
577 The labels that indicate a passing result. Must be a subset of labels.
578
579 - `type: "label_model"`
580
581 The object type, which is always `label_model`.
582
583 - `name: string`
584
585 The name of the grader.
586
587 - `type: "multi"`
588
589 The object type, which is always `multi`.
590
591### Example
592
593```cli
594openai fine-tuning:alpha:graders validate \
595 --api-key 'My API Key' \
596 --grader '{input: input, name: name, operation: eq, reference: reference, type: string_check}'
597```
598
599#### Response
600
601```json
602{
603 "grader": {
604 "input": "input",
605 "name": "name",
606 "operation": "eq",
607 "reference": "reference",
608 "type": "string_check"
609 }
610}
611```