Alpha
Graders
Run grader
$ openai fine-tuning:alpha:graders run
post /fine_tuning/alpha/graders/run
Run a grader.
Parameters
-
--grader: StringCheckGrader or TextSimilarityGrader or PythonGrader or 2 moreThe grader used for the fine-tuning job.
-
--model-sample: stringThe model sample to be evaluated. This value will be used to populate the
samplenamespace. See the guide for more details. Theoutput_jsonvariable will be populated if the model sample is a valid JSON string. -
--item: optional unknownThe dataset item provided to the grader. This will be used to populate the
itemnamespace. See the guide for more details.
Returns
-
FineTuningAlphaGraderRunResponse: object { metadata, model_grader_token_usage_per_model, reward, sub_rewards }-
metadata: object { errors, execution_time, name, 4 more }-
errors: object { formula_parse_error, invalid_variable_error, model_grader_parse_error, 11 more }-
formula_parse_error: boolean -
invalid_variable_error: boolean -
model_grader_parse_error: boolean -
model_grader_refusal_error: boolean -
model_grader_server_error: boolean -
model_grader_server_error_details: string -
other_error: boolean -
python_grader_runtime_error: boolean -
python_grader_runtime_error_details: string -
python_grader_server_error: boolean -
python_grader_server_error_type: string -
sample_parse_error: boolean -
truncated_observation_error: boolean -
unresponsive_reward_error: boolean
-
-
execution_time: number -
name: string -
sampled_model_name: string -
scores: map[unknown] -
token_usage: number -
type: string
-
-
model_grader_token_usage_per_model: map[unknown] -
reward: number -
sub_rewards: map[unknown]
-
Example
openai fine-tuning:alpha:graders run \
--api-key 'My API Key' \
--grader '{input: input, name: name, operation: eq, reference: reference, type: string_check}' \
--model-sample model_sample
Response
{
"metadata": {
"errors": {
"formula_parse_error": true,
"invalid_variable_error": true,
"model_grader_parse_error": true,
"model_grader_refusal_error": true,
"model_grader_server_error": true,
"model_grader_server_error_details": "model_grader_server_error_details",
"other_error": true,
"python_grader_runtime_error": true,
"python_grader_runtime_error_details": "python_grader_runtime_error_details",
"python_grader_server_error": true,
"python_grader_server_error_type": "python_grader_server_error_type",
"sample_parse_error": true,
"truncated_observation_error": true,
"unresponsive_reward_error": true
},
"execution_time": 0,
"name": "name",
"sampled_model_name": "sampled_model_name",
"scores": {
"foo": "bar"
},
"token_usage": 0,
"type": "type"
},
"model_grader_token_usage_per_model": {
"foo": "bar"
},
"reward": 0,
"sub_rewards": {
"foo": "bar"
}
}
Validate grader
$ openai fine-tuning:alpha:graders validate
post /fine_tuning/alpha/graders/validate
Validate a grader.
Parameters
-
--grader: StringCheckGrader or TextSimilarityGrader or PythonGrader or 2 moreThe grader used for the fine-tuning job.
Returns
-
FineTuningAlphaGraderValidateResponse: object { grader }-
grader: optional StringCheckGrader or TextSimilarityGrader or PythonGrader or 2 moreThe grader used for the fine-tuning job.
-
string_check_grader: object { input, name, operation, 2 more }A StringCheckGrader object that performs a string comparison between input and reference using a specified operation.
-
input: stringThe input text. This may include template strings.
-
name: stringThe name of the grader.
-
operation: "eq" or "ne" or "like" or "ilike"The string check operation to perform. One of
eq,ne,like, orilike.-
"eq" -
"ne" -
"like" -
"ilike"
-
-
reference: stringThe reference text. This may include template strings.
-
type: "string_check"The object type, which is always
string_check.
-
-
text_similarity_grader: object { evaluation_metric, input, name, 2 more }A TextSimilarityGrader object which grades text based on similarity metrics.
-
evaluation_metric: "cosine" or "fuzzy_match" or "bleu" or 8 moreThe evaluation metric to use. One of
cosine,fuzzy_match,bleu,gleu,meteor,rouge_1,rouge_2,rouge_3,rouge_4,rouge_5, orrouge_l.-
"cosine" -
"fuzzy_match" -
"bleu" -
"gleu" -
"meteor" -
"rouge_1" -
"rouge_2" -
"rouge_3" -
"rouge_4" -
"rouge_5" -
"rouge_l"
-
-
input: stringThe text being graded.
-
name: stringThe name of the grader.
-
reference: stringThe text being graded against.
-
type: "text_similarity"The type of grader.
-
-
python_grader: object { name, source, type, image_tag }A PythonGrader object that runs a python script on the input.
-
name: stringThe name of the grader.
-
source: stringThe source code of the python script.
-
type: "python"The object type, which is always
python. -
image_tag: optional stringThe image tag to use for the python script.
-
-
score_model_grader: object { input, model, name, 3 more }A ScoreModelGrader object that uses a model to assign a score to the input.
-
input: array of object { content, role, type }The input messages evaluated by the grader. Supports text, output text, input image, and input audio content blocks, and may include template strings.
-
content: string or ResponseInputText or object { text, type } or 3 moreInputs to the model - can contain template strings. Supports text, output text, input images, and input audio, either as a single item or an array of items.
-
Text input: stringA text input to the model.
-
response_input_text: object { text, type }A text input to the model.
-
text: stringThe text input to the model.
-
type: "input_text"The type of the input item. Always
input_text.
-
-
Output text: object { text, type }A text output from the model.
-
text: stringThe text output from the model.
-
type: "output_text"The type of the output text. Always
output_text.
-
-
Input image: object { image_url, type, detail }An image input block used within EvalItem content arrays.
-
image_url: stringThe URL of the image input.
-
type: "input_image"The type of the image input. Always
input_image. -
detail: optional stringThe detail level of the image to be sent to the model. One of
high,low, orauto. Defaults toauto.
-
-
response_input_audio: object { input_audio, type }An audio input to the model.
-
input_audio: object { data, format }-
data: stringBase64-encoded audio data.
-
format: "mp3" or "wav"The format of the audio data. Currently supported formats are
mp3andwav.-
"mp3" -
"wav"
-
-
-
type: "input_audio"The type of the input item. Always
input_audio.
-
-
grader_inputs: array of string or ResponseInputText or object { text, type } or 2 moreA list of inputs, each of which may be either an input text, output text, input image, or input audio object.
-
Text input: stringA text input to the model.
-
response_input_text: object { text, type }A text input to the model.
-
Output text: object { text, type }A text output from the model.
-
text: stringThe text output from the model.
-
type: "output_text"The type of the output text. Always
output_text.
-
-
Input image: object { image_url, type, detail }An image input block used within EvalItem content arrays.
-
image_url: stringThe URL of the image input.
-
type: "input_image"The type of the image input. Always
input_image. -
detail: optional stringThe detail level of the image to be sent to the model. One of
high,low, orauto. Defaults toauto.
-
-
response_input_audio: object { input_audio, type }An audio input to the model.
-
-
-
role: "user" or "assistant" or "system" or "developer"The role of the message input. One of
user,assistant,system, ordeveloper.-
"user" -
"assistant" -
"system" -
"developer"
-
-
type: optional "message"The type of the message input. Always
message."message"
-
-
model: stringThe model to use for the evaluation.
-
name: stringThe name of the grader.
-
type: "score_model"The object type, which is always
score_model. -
range: optional array of numberThe range of the score. Defaults to
[0, 1]. -
sampling_params: optional object { max_completions_tokens, reasoning_effort, seed, 2 more }The sampling parameters for the model.
-
max_completions_tokens: optional numberThe maximum number of tokens the grader model may generate in its response.
-
reasoning_effort: optional "none" or "minimal" or "low" or 3 moreConstrains effort on reasoning for reasoning models. Currently supported values are
none,minimal,low,medium,high, andxhigh. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.-
gpt-5.1defaults tonone, which does not perform reasoning. The supported reasoning values forgpt-5.1arenone,low,medium, andhigh. Tool calls are supported for all reasoning values in gpt-5.1. -
All models before
gpt-5.1default tomediumreasoning effort, and do not supportnone. -
The
gpt-5-promodel defaults to (and only supports)highreasoning effort. -
xhighis supported for all models aftergpt-5.1-codex-max. -
"none" -
"minimal" -
"low" -
"medium" -
"high" -
"xhigh"
-
-
seed: optional numberA seed value to initialize the randomness, during sampling.
-
temperature: optional numberA higher temperature increases randomness in the outputs.
-
top_p: optional numberAn alternative to temperature for nucleus sampling; 1.0 includes all tokens.
-
-
-
multi_grader: object { calculate_output, graders, name, type }A MultiGrader object combines the output of multiple graders to produce a single score.
-
calculate_output: stringA formula to calculate the output based on grader results.
-
graders: StringCheckGrader or TextSimilarityGrader or PythonGrader or 2 moreA StringCheckGrader object that performs a string comparison between input and reference using a specified operation.
-
string_check_grader: object { input, name, operation, 2 more }A StringCheckGrader object that performs a string comparison between input and reference using a specified operation.
-
text_similarity_grader: object { evaluation_metric, input, name, 2 more }A TextSimilarityGrader object which grades text based on similarity metrics.
-
python_grader: object { name, source, type, image_tag }A PythonGrader object that runs a python script on the input.
-
score_model_grader: object { input, model, name, 3 more }A ScoreModelGrader object that uses a model to assign a score to the input.
-
label_model_grader: object { input, labels, model, 3 more }A LabelModelGrader object which uses a model to assign labels to each item in the evaluation.
-
input: array of object { content, role, type }-
content: string or ResponseInputText or object { text, type } or 3 moreInputs to the model - can contain template strings. Supports text, output text, input images, and input audio, either as a single item or an array of items.
-
Text input: stringA text input to the model.
-
response_input_text: object { text, type }A text input to the model.
-
Output text: object { text, type }A text output from the model.
-
text: stringThe text output from the model.
-
type: "output_text"The type of the output text. Always
output_text.
-
-
Input image: object { image_url, type, detail }An image input block used within EvalItem content arrays.
-
image_url: stringThe URL of the image input.
-
type: "input_image"The type of the image input. Always
input_image. -
detail: optional stringThe detail level of the image to be sent to the model. One of
high,low, orauto. Defaults toauto.
-
-
response_input_audio: object { input_audio, type }An audio input to the model.
-
grader_inputs: array of string or ResponseInputText or object { text, type } or 2 moreA list of inputs, each of which may be either an input text, output text, input image, or input audio object.
-
-
role: "user" or "assistant" or "system" or "developer"The role of the message input. One of
user,assistant,system, ordeveloper.-
"user" -
"assistant" -
"system" -
"developer"
-
-
type: optional "message"The type of the message input. Always
message."message"
-
-
labels: array of stringThe labels to assign to each item in the evaluation.
-
model: stringThe model to use for the evaluation. Must support structured outputs.
-
name: stringThe name of the grader.
-
passing_labels: array of stringThe labels that indicate a passing result. Must be a subset of labels.
-
type: "label_model"The object type, which is always
label_model.
-
-
-
name: stringThe name of the grader.
-
type: "multi"The object type, which is always
multi.
-
-
-
Example
openai fine-tuning:alpha:graders validate \
--api-key 'My API Key' \
--grader '{input: input, name: name, operation: eq, reference: reference, type: string_check}'
Response
{
"grader": {
"input": "input",
"name": "name",
"operation": "eq",
"reference": "reference",
"type": "string_check"
}
}