Methods

Domain Types

Dpo Hyperparameters

dpo_hyperparameters: object { batch_size, beta, learning_rate_multiplier, n_epochs }

The hyperparameters used for the DPO fine-tuning job.
- batch_size: optional "auto" or number
  
  Number of examples in each batch. A larger batch size means that model parameters are updated less frequently, but with lower variance.
  - union_member_0: "auto"
  - union_member_1: number
- beta: optional "auto" or number
  
  The beta value for the DPO method. A higher beta value will increase the weight of the penalty between the policy and reference model.
  - union_member_0: "auto"
  - union_member_1: number
- learning_rate_multiplier: optional "auto" or number
  
  Scaling factor for the learning rate. A smaller learning rate may be useful to avoid overfitting.
  - union_member_0: "auto"
  - union_member_1: number
- n_epochs: optional "auto" or number
  
  The number of epochs to train the model for. An epoch refers to one full cycle through the training dataset.
  - union_member_0: "auto"
  - union_member_1: number

Dpo Method

dpo_method: object { hyperparameters }

Configuration for the DPO fine-tuning method.
- hyperparameters: optional object { batch_size, beta, learning_rate_multiplier, n_epochs }
  
  The hyperparameters used for the DPO fine-tuning job.
  - batch_size: optional "auto" or number
    
    Number of examples in each batch. A larger batch size means that model parameters are updated less frequently, but with lower variance.
    - union_member_0: "auto"
    - union_member_1: number
  - beta: optional "auto" or number
    
    The beta value for the DPO method. A higher beta value will increase the weight of the penalty between the policy and reference model.
    - union_member_0: "auto"
    - union_member_1: number
  - learning_rate_multiplier: optional "auto" or number
    
    Scaling factor for the learning rate. A smaller learning rate may be useful to avoid overfitting.
    - union_member_0: "auto"
    - union_member_1: number
  - n_epochs: optional "auto" or number
    
    The number of epochs to train the model for. An epoch refers to one full cycle through the training dataset.
    - union_member_0: "auto"
    - union_member_1: number

Reinforcement Hyperparameters

reinforcement_hyperparameters: object { batch_size, compute_multiplier, eval_interval, 4 more }

The hyperparameters used for the reinforcement fine-tuning job.
- batch_size: optional "auto" or number
  
  Number of examples in each batch. A larger batch size means that model parameters are updated less frequently, but with lower variance.
  - union_member_0: "auto"
  - union_member_1: number
- compute_multiplier: optional "auto" or number
  
  Multiplier on amount of compute used for exploring search space during training.
  - union_member_0: "auto"
  - union_member_1: number
- eval_interval: optional "auto" or number
  
  The number of training steps between evaluation runs.
  - union_member_0: "auto"
  - union_member_1: number
- eval_samples: optional "auto" or number
  
  Number of evaluation samples to generate per training step.
  - union_member_0: "auto"
  - union_member_1: number
- learning_rate_multiplier: optional "auto" or number
  
  Scaling factor for the learning rate. A smaller learning rate may be useful to avoid overfitting.
  - union_member_0: "auto"
  - union_member_1: number
- n_epochs: optional "auto" or number
  
  The number of epochs to train the model for. An epoch refers to one full cycle through the training dataset.
  - union_member_0: "auto"
  - union_member_1: number
- reasoning_effort: optional "default" or "low" or "medium" or "high"
  
  Level of reasoning effort.
  - "default"
  - "low"
  - "medium"
  - "high"

Reinforcement Method

reinforcement_method: object { grader, hyperparameters }

Configuration for the reinforcement fine-tuning method.
- grader: StringCheckGrader or TextSimilarityGrader or PythonGrader or 2 more
  
  The grader used for the fine-tuning job.
  - string_check_grader: object { input, name, operation, 2 more }
    
    A StringCheckGrader object that performs a string comparison between input and reference using a specified operation.
    - input: string
      
      The input text. This may include template strings.
    - name: string
      
      The name of the grader.
    - operation: "eq" or "ne" or "like" or "ilike"
      
      The string check operation to perform. One of eq, ne, like, or ilike.
      - "eq"
      - "ne"
      - "like"
      - "ilike"
    - reference: string
      
      The reference text. This may include template strings.
    - type: "string_check"
      
      The object type, which is always string_check.
  - text_similarity_grader: object { evaluation_metric, input, name, 2 more }
    
    A TextSimilarityGrader object which grades text based on similarity metrics.
    - evaluation_metric: "cosine" or "fuzzy_match" or "bleu" or 8 more
      
      The evaluation metric to use. One of cosine, fuzzy_match, bleu, gleu, meteor, rouge_1, rouge_2, rouge_3, rouge_4, rouge_5, or rouge_l.
      - "cosine"
      - "fuzzy_match"
      - "bleu"
      - "gleu"
      - "meteor"
      - "rouge_1"
      - "rouge_2"
      - "rouge_3"
      - "rouge_4"
      - "rouge_5"
      - "rouge_l"
    - input: string
      
      The text being graded.
    - name: string
      
      The name of the grader.
    - reference: string
      
      The text being graded against.
    - type: "text_similarity"
      
      The type of grader.
  - python_grader: object { name, source, type, image_tag }
    
    A PythonGrader object that runs a python script on the input.
    - name: string
      
      The name of the grader.
    - source: string
      
      The source code of the python script.
    - type: "python"
      
      The object type, which is always python.
    - image_tag: optional string
      
      The image tag to use for the python script.
  - score_model_grader: object { input, model, name, 3 more }
    
    A ScoreModelGrader object that uses a model to assign a score to the input.
    - input: array of object { content, role, type }
      
      The input messages evaluated by the grader. Supports text, output text, input image, and input audio content blocks, and may include template strings.
      - content: string or ResponseInputText or object { text, type } or 3 more
        
        Inputs to the model - can contain template strings. Supports text, output text, input images, and input audio, either as a single item or an array of items.
        
        Text input: string
        
        A text input to the model.
        
        response_input_text: object { text, type }
        
        A text input to the model.
        
        text: string
        
        The text input to the model.
        
        type: "input_text"
        
        The type of the input item. Always input_text.
        
        Output text: object { text, type }
        
        A text output from the model.
        
        text: string
        
        The text output from the model.
        
        type: "output_text"
        
        The type of the output text. Always output_text.
        
        Input image: object { image_url, type, detail }
        
        An image input block used within EvalItem content arrays.
        
        image_url: string
        
        The URL of the image input.
        
        type: "input_image"
        
        The type of the image input. Always input_image.
        
        detail: optional string
        
        The detail level of the image to be sent to the model. One of high, low, or auto. Defaults to auto.
        
        response_input_audio: object { input_audio, type }
        
        An audio input to the model.
        
        input_audio: object { data, format }
        
        data: string
        
        Base64-encoded audio data.
        
        format: "mp3" or "wav"
        
        The format of the audio data. Currently supported formats are mp3 and wav.
        
        "mp3"
        
        "wav"
        
        type: "input_audio"
        
        The type of the input item. Always input_audio.
        
        grader_inputs: array of string or ResponseInputText or object { text, type } or 2 more
        
        A list of inputs, each of which may be either an input text, output text, input image, or input audio object.
        
        Text input: string
        
        A text input to the model.
        
        response_input_text: object { text, type }
        
        A text input to the model.
        
        Output text: object { text, type }
        
        A text output from the model.
        
        text: string
        
        The text output from the model.
        
        type: "output_text"
        
        The type of the output text. Always output_text.
        
        Input image: object { image_url, type, detail }
        
        An image input block used within EvalItem content arrays.
        
        image_url: string
        
        The URL of the image input.
        
        type: "input_image"
        
        The type of the image input. Always input_image.
        
        detail: optional string
        
        The detail level of the image to be sent to the model. One of high, low, or auto. Defaults to auto.
        
        response_input_audio: object { input_audio, type }
        
        An audio input to the model.
      - role: "user" or "assistant" or "system" or "developer"
        
        The role of the message input. One of user, assistant, system, or developer.
        
        "user"
        
        "assistant"
        
        "system"
        
        "developer"
      - type: optional "message"
        
        The type of the message input. Always message.
        
        "message"
    - model: string
      
      The model to use for the evaluation.
    - name: string
      
      The name of the grader.
    - type: "score_model"
      
      The object type, which is always score_model.
    - range: optional array of number
      
      The range of the score. Defaults to [0, 1].
    - sampling_params: optional object { max_completions_tokens, reasoning_effort, seed, 2 more }
      
      The sampling parameters for the model.
      - max_completions_tokens: optional number
        
        The maximum number of tokens the grader model may generate in its response.
      - reasoning_effort: optional "none" or "minimal" or "low" or 3 more
        
        Constrains effort on reasoning for reasoning models. Currently supported values are none, minimal, low, medium, high, and xhigh. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
        
        gpt-5.1 defaults to none, which does not perform reasoning. The supported reasoning values for gpt-5.1 are none, low, medium, and high. Tool calls are supported for all reasoning values in gpt-5.1.
        
        All models before gpt-5.1 default to medium reasoning effort, and do not support none.
        
        The gpt-5-pro model defaults to (and only supports) high reasoning effort.
        
        xhigh is supported for all models after gpt-5.1-codex-max.
        
        "none"
        
        "minimal"
        
        "low"
        
        "medium"
        
        "high"
        
        "xhigh"
      - seed: optional number
        
        A seed value to initialize the randomness, during sampling.
      - temperature: optional number
        
        A higher temperature increases randomness in the outputs.
      - top_p: optional number
        
        An alternative to temperature for nucleus sampling; 1.0 includes all tokens.
  - multi_grader: object { calculate_output, graders, name, type }
    
    A MultiGrader object combines the output of multiple graders to produce a single score.
    - calculate_output: string
      
      A formula to calculate the output based on grader results.
    - graders: StringCheckGrader or TextSimilarityGrader or PythonGrader or 2 more
      
      A StringCheckGrader object that performs a string comparison between input and reference using a specified operation.
      - string_check_grader: object { input, name, operation, 2 more }
        
        A StringCheckGrader object that performs a string comparison between input and reference using a specified operation.
      - text_similarity_grader: object { evaluation_metric, input, name, 2 more }
        
        A TextSimilarityGrader object which grades text based on similarity metrics.
      - python_grader: object { name, source, type, image_tag }
        
        A PythonGrader object that runs a python script on the input.
      - score_model_grader: object { input, model, name, 3 more }
        
        A ScoreModelGrader object that uses a model to assign a score to the input.
      - label_model_grader: object { input, labels, model, 3 more }
        
        A LabelModelGrader object which uses a model to assign labels to each item in the evaluation.
        
        input: array of object { content, role, type }
        
        content: string or ResponseInputText or object { text, type } or 3 more
        
        Inputs to the model - can contain template strings. Supports text, output text, input images, and input audio, either as a single item or an array of items.
        
        Text input: string
        
        A text input to the model.
        
        response_input_text: object { text, type }
        
        A text input to the model.
        
        Output text: object { text, type }
        
        A text output from the model.
        
        text: string
        
        The text output from the model.
        
        type: "output_text"
        
        The type of the output text. Always output_text.
        
        Input image: object { image_url, type, detail }
        
        An image input block used within EvalItem content arrays.
        
        image_url: string
        
        The URL of the image input.
        
        type: "input_image"
        
        The type of the image input. Always input_image.
        
        detail: optional string
        
        The detail level of the image to be sent to the model. One of high, low, or auto. Defaults to auto.
        
        response_input_audio: object { input_audio, type }
        
        An audio input to the model.
        
        grader_inputs: array of string or ResponseInputText or object { text, type } or 2 more
        
        A list of inputs, each of which may be either an input text, output text, input image, or input audio object.
        
        role: "user" or "assistant" or "system" or "developer"
        
        The role of the message input. One of user, assistant, system, or developer.
        
        "user"
        
        "assistant"
        
        "system"
        
        "developer"
        
        type: optional "message"
        
        The type of the message input. Always message.
        
        "message"
        
        labels: array of string
        
        The labels to assign to each item in the evaluation.
        
        model: string
        
        The model to use for the evaluation. Must support structured outputs.
        
        name: string
        
        The name of the grader.
        
        passing_labels: array of string
        
        The labels that indicate a passing result. Must be a subset of labels.
        
        type: "label_model"
        
        The object type, which is always label_model.
    - name: string
      
      The name of the grader.
    - type: "multi"
      
      The object type, which is always multi.
- hyperparameters: optional object { batch_size, compute_multiplier, eval_interval, 4 more }
  
  The hyperparameters used for the reinforcement fine-tuning job.
  - batch_size: optional "auto" or number
    
    Number of examples in each batch. A larger batch size means that model parameters are updated less frequently, but with lower variance.
    - union_member_0: "auto"
    - union_member_1: number
  - compute_multiplier: optional "auto" or number
    
    Multiplier on amount of compute used for exploring search space during training.
    - union_member_0: "auto"
    - union_member_1: number
  - eval_interval: optional "auto" or number
    
    The number of training steps between evaluation runs.
    - union_member_0: "auto"
    - union_member_1: number
  - eval_samples: optional "auto" or number
    
    Number of evaluation samples to generate per training step.
    - union_member_0: "auto"
    - union_member_1: number
  - learning_rate_multiplier: optional "auto" or number
    
    Scaling factor for the learning rate. A smaller learning rate may be useful to avoid overfitting.
    - union_member_0: "auto"
    - union_member_1: number
  - n_epochs: optional "auto" or number
    
    The number of epochs to train the model for. An epoch refers to one full cycle through the training dataset.
    - union_member_0: "auto"
    - union_member_1: number
  - reasoning_effort: optional "default" or "low" or "medium" or "high"
    
    Level of reasoning effort.
    - "default"
    - "low"
    - "medium"
    - "high"

Supervised Hyperparameters

supervised_hyperparameters: object { batch_size, learning_rate_multiplier, n_epochs }

The hyperparameters used for the fine-tuning job.
- batch_size: optional "auto" or number
  
  Number of examples in each batch. A larger batch size means that model parameters are updated less frequently, but with lower variance.
  - union_member_0: "auto"
  - union_member_1: number
- learning_rate_multiplier: optional "auto" or number
  
  Scaling factor for the learning rate. A smaller learning rate may be useful to avoid overfitting.
  - union_member_0: "auto"
  - union_member_1: number
- n_epochs: optional "auto" or number
  
  The number of epochs to train the model for. An epoch refers to one full cycle through the training dataset.
  - union_member_0: "auto"
  - union_member_1: number

Supervised Method

supervised_method: object { hyperparameters }

Configuration for the supervised fine-tuning method.
- hyperparameters: optional object { batch_size, learning_rate_multiplier, n_epochs }
  
  The hyperparameters used for the fine-tuning job.
  - batch_size: optional "auto" or number
    
    Number of examples in each batch. A larger batch size means that model parameters are updated less frequently, but with lower variance.
    - union_member_0: "auto"
    - union_member_1: number
  - learning_rate_multiplier: optional "auto" or number
    
    Scaling factor for the learning rate. A smaller learning rate may be useful to avoid overfitting.
    - union_member_0: "auto"
    - union_member_1: number
  - n_epochs: optional "auto" or number
    
    The number of epochs to train the model for. An epoch refers to one full cycle through the training dataset.
    - union_member_0: "auto"
    - union_member_1: number

cli/resources/fine_tuning/subresources/methods/index.md +722 −0 created

1# Methods

3## Domain Types

5### Dpo Hyperparameters

7- `dpo_hyperparameters: object { batch_size, beta, learning_rate_multiplier, n_epochs }`

9 The hyperparameters used for the DPO fine-tuning job.

11 - `batch_size: optional "auto" or number`

13 Number of examples in each batch. A larger batch size means that model parameters are updated less frequently, but with lower variance.

15 - `union_member_0: "auto"`

17 - `union_member_1: number`

19 - `beta: optional "auto" or number`

21 The beta value for the DPO method. A higher beta value will increase the weight of the penalty between the policy and reference model.

23 - `union_member_0: "auto"`

25 - `union_member_1: number`

27 - `learning_rate_multiplier: optional "auto" or number`

29 Scaling factor for the learning rate. A smaller learning rate may be useful to avoid overfitting.

31 - `union_member_0: "auto"`

33 - `union_member_1: number`

35 - `n_epochs: optional "auto" or number`

37 The number of epochs to train the model for. An epoch refers to one full cycle through the training dataset.

39 - `union_member_0: "auto"`

41 - `union_member_1: number`

43### Dpo Method

45- `dpo_method: object { hyperparameters }`

47 Configuration for the DPO fine-tuning method.

49 - `hyperparameters: optional object { batch_size, beta, learning_rate_multiplier, n_epochs }`

51 The hyperparameters used for the DPO fine-tuning job.

53 - `batch_size: optional "auto" or number`

55 Number of examples in each batch. A larger batch size means that model parameters are updated less frequently, but with lower variance.

57 - `union_member_0: "auto"`

59 - `union_member_1: number`

61 - `beta: optional "auto" or number`

63 The beta value for the DPO method. A higher beta value will increase the weight of the penalty between the policy and reference model.

65 - `union_member_0: "auto"`

67 - `union_member_1: number`

69 - `learning_rate_multiplier: optional "auto" or number`

71 Scaling factor for the learning rate. A smaller learning rate may be useful to avoid overfitting.

73 - `union_member_0: "auto"`

75 - `union_member_1: number`

77 - `n_epochs: optional "auto" or number`

79 The number of epochs to train the model for. An epoch refers to one full cycle through the training dataset.

81 - `union_member_0: "auto"`

83 - `union_member_1: number`

85### Reinforcement Hyperparameters

87- `reinforcement_hyperparameters: object { batch_size, compute_multiplier, eval_interval, 4 more }`

89 The hyperparameters used for the reinforcement fine-tuning job.

91 - `batch_size: optional "auto" or number`

93 Number of examples in each batch. A larger batch size means that model parameters are updated less frequently, but with lower variance.

95 - `union_member_0: "auto"`

97 - `union_member_1: number`

99 - `compute_multiplier: optional "auto" or number`

100

101 Multiplier on amount of compute used for exploring search space during training.

102

103 - `union_member_0: "auto"`

104

105 - `union_member_1: number`

106

107 - `eval_interval: optional "auto" or number`

108

109 The number of training steps between evaluation runs.

110

111 - `union_member_0: "auto"`

112

113 - `union_member_1: number`

114

115 - `eval_samples: optional "auto" or number`

116

117 Number of evaluation samples to generate per training step.

118

119 - `union_member_0: "auto"`

120

121 - `union_member_1: number`

122

123 - `learning_rate_multiplier: optional "auto" or number`

124

125 Scaling factor for the learning rate. A smaller learning rate may be useful to avoid overfitting.

126

127 - `union_member_0: "auto"`

128

129 - `union_member_1: number`

130

131 - `n_epochs: optional "auto" or number`

132

133 The number of epochs to train the model for. An epoch refers to one full cycle through the training dataset.

134

135 - `union_member_0: "auto"`

136

137 - `union_member_1: number`

138

139 - `reasoning_effort: optional "default" or "low" or "medium" or "high"`

140

141 Level of reasoning effort.

142

143 - `"default"`

144

145 - `"low"`

146

147 - `"medium"`

148

149 - `"high"`

150

151### Reinforcement Method

152

153- `reinforcement_method: object { grader, hyperparameters }`

154

155 Configuration for the reinforcement fine-tuning method.

156

157 - `grader: StringCheckGrader or TextSimilarityGrader or PythonGrader or 2 more`

158

159 The grader used for the fine-tuning job.

160

161 - `string_check_grader: object { input, name, operation, 2 more }`

162

163 A StringCheckGrader object that performs a string comparison between input and reference using a specified operation.

164

165 - `input: string`

166

167 The input text. This may include template strings.

168

169 - `name: string`

170

171 The name of the grader.

172

173 - `operation: "eq" or "ne" or "like" or "ilike"`

174

175 The string check operation to perform. One of `eq`, `ne`, `like`, or `ilike`.

176

177 - `"eq"`

178

179 - `"ne"`

180

181 - `"like"`

182

183 - `"ilike"`

184

185 - `reference: string`

186

187 The reference text. This may include template strings.

188

189 - `type: "string_check"`

190

191 The object type, which is always `string_check`.

192

193 - `text_similarity_grader: object { evaluation_metric, input, name, 2 more }`

194

195 A TextSimilarityGrader object which grades text based on similarity metrics.

196

197 - `evaluation_metric: "cosine" or "fuzzy_match" or "bleu" or 8 more`

198

199 The evaluation metric to use. One of `cosine`, `fuzzy_match`, `bleu`,

200 `gleu`, `meteor`, `rouge_1`, `rouge_2`, `rouge_3`, `rouge_4`, `rouge_5`,

201 or `rouge_l`.

202

203 - `"cosine"`

204

205 - `"fuzzy_match"`

206

207 - `"bleu"`

208

209 - `"gleu"`

210

211 - `"meteor"`

212

213 - `"rouge_1"`

214

215 - `"rouge_2"`

216

217 - `"rouge_3"`

218

219 - `"rouge_4"`

220

221 - `"rouge_5"`

222

223 - `"rouge_l"`

224

225 - `input: string`

226

227 The text being graded.

228

229 - `name: string`

230

231 The name of the grader.

232

233 - `reference: string`

234

235 The text being graded against.

236

237 - `type: "text_similarity"`

238

239 The type of grader.

240

241 - `python_grader: object { name, source, type, image_tag }`

242

243 A PythonGrader object that runs a python script on the input.

244

245 - `name: string`

246

247 The name of the grader.

248

249 - `source: string`

250

251 The source code of the python script.

252

253 - `type: "python"`

254

255 The object type, which is always `python`.

256

257 - `image_tag: optional string`

258

259 The image tag to use for the python script.

260

261 - `score_model_grader: object { input, model, name, 3 more }`

262

263 A ScoreModelGrader object that uses a model to assign a score to the input.

264

265 - `input: array of object { content, role, type }`

266

267 The input messages evaluated by the grader. Supports text, output text, input image, and input audio content blocks, and may include template strings.

268

269 - `content: string or ResponseInputText or object { text, type } or 3 more`

270

271 Inputs to the model - can contain template strings. Supports text, output text, input images, and input audio, either as a single item or an array of items.

272

273 - `Text input: string`

274

275 A text input to the model.

276

277 - `response_input_text: object { text, type }`

278

279 A text input to the model.

280

281 - `text: string`

282

283 The text input to the model.

284

285 - `type: "input_text"`

286

287 The type of the input item. Always `input_text`.

288

289 - `Output text: object { text, type }`

290

291 A text output from the model.

292

293 - `text: string`

294

295 The text output from the model.

296

297 - `type: "output_text"`

298

299 The type of the output text. Always `output_text`.

300

301 - `Input image: object { image_url, type, detail }`

302

303 An image input block used within EvalItem content arrays.

304

305 - `image_url: string`

306

307 The URL of the image input.

308

309 - `type: "input_image"`

310

311 The type of the image input. Always `input_image`.

312

313 - `detail: optional string`

314

315 The detail level of the image to be sent to the model. One of `high`, `low`, or `auto`. Defaults to `auto`.

316

317 - `response_input_audio: object { input_audio, type }`

318

319 An audio input to the model.

320

321 - `input_audio: object { data, format }`

322

323 - `data: string`

324

325 Base64-encoded audio data.

326

327 - `format: "mp3" or "wav"`

328

329 The format of the audio data. Currently supported formats are `mp3` and

330 `wav`.

331

332 - `"mp3"`

333

334 - `"wav"`

335

336 - `type: "input_audio"`

337

338 The type of the input item. Always `input_audio`.

339

340 - `grader_inputs: array of string or ResponseInputText or object { text, type } or 2 more`

341

342 A list of inputs, each of which may be either an input text, output text, input

343 image, or input audio object.

344

345 - `Text input: string`

346

347 A text input to the model.

348

349 - `response_input_text: object { text, type }`

350

351 A text input to the model.

352

353 - `Output text: object { text, type }`

354

355 A text output from the model.

356

357 - `text: string`

358

359 The text output from the model.

360

361 - `type: "output_text"`

362

363 The type of the output text. Always `output_text`.

364

365 - `Input image: object { image_url, type, detail }`

366

367 An image input block used within EvalItem content arrays.

368

369 - `image_url: string`

370

371 The URL of the image input.

372

373 - `type: "input_image"`

374

375 The type of the image input. Always `input_image`.

376

377 - `detail: optional string`

378

379 The detail level of the image to be sent to the model. One of `high`, `low`, or `auto`. Defaults to `auto`.

380

381 - `response_input_audio: object { input_audio, type }`

382

383 An audio input to the model.

384

385 - `role: "user" or "assistant" or "system" or "developer"`

386

387 The role of the message input. One of `user`, `assistant`, `system`, or

388 `developer`.

389

390 - `"user"`

391

392 - `"assistant"`

393

394 - `"system"`

395

396 - `"developer"`

397

398 - `type: optional "message"`

399

400 The type of the message input. Always `message`.

401

402 - `"message"`

403

404 - `model: string`

405

406 The model to use for the evaluation.

407

408 - `name: string`

409

410 The name of the grader.

411

412 - `type: "score_model"`

413

414 The object type, which is always `score_model`.

415

416 - `range: optional array of number`

417

418 The range of the score. Defaults to `[0, 1]`.

419

420 - `sampling_params: optional object { max_completions_tokens, reasoning_effort, seed, 2 more }`

421

422 The sampling parameters for the model.

423

424 - `max_completions_tokens: optional number`

425

426 The maximum number of tokens the grader model may generate in its response.

427

428 - `reasoning_effort: optional "none" or "minimal" or "low" or 3 more`

429

430 Constrains effort on reasoning for

431 [reasoning models](https://platform.openai.com/docs/guides/reasoning).

432 Currently supported values are `none`, `minimal`, `low`, `medium`, `high`, and `xhigh`. Reducing

433 reasoning effort can result in faster responses and fewer tokens used

434 on reasoning in a response.

435

436 - `gpt-5.1` defaults to `none`, which does not perform reasoning. The supported reasoning values for `gpt-5.1` are `none`, `low`, `medium`, and `high`. Tool calls are supported for all reasoning values in gpt-5.1.

437 - All models before `gpt-5.1` default to `medium` reasoning effort, and do not support `none`.

438 - The `gpt-5-pro` model defaults to (and only supports) `high` reasoning effort.

439 - `xhigh` is supported for all models after `gpt-5.1-codex-max`.

440

441 - `"none"`

442

443 - `"minimal"`

444

445 - `"low"`

446

447 - `"medium"`

448

449 - `"high"`

450

451 - `"xhigh"`

452

453 - `seed: optional number`

454

455 A seed value to initialize the randomness, during sampling.

456

457 - `temperature: optional number`

458

459 A higher temperature increases randomness in the outputs.

460

461 - `top_p: optional number`

462

463 An alternative to temperature for nucleus sampling; 1.0 includes all tokens.

464

465 - `multi_grader: object { calculate_output, graders, name, type }`

466

467 A MultiGrader object combines the output of multiple graders to produce a single score.

468

469 - `calculate_output: string`

470

471 A formula to calculate the output based on grader results.

472

473 - `graders: StringCheckGrader or TextSimilarityGrader or PythonGrader or 2 more`

474

475 A StringCheckGrader object that performs a string comparison between input and reference using a specified operation.

476

477 - `string_check_grader: object { input, name, operation, 2 more }`

478

479 A StringCheckGrader object that performs a string comparison between input and reference using a specified operation.

480

481 - `text_similarity_grader: object { evaluation_metric, input, name, 2 more }`

482

483 A TextSimilarityGrader object which grades text based on similarity metrics.

484

485 - `python_grader: object { name, source, type, image_tag }`

486

487 A PythonGrader object that runs a python script on the input.

488

489 - `score_model_grader: object { input, model, name, 3 more }`

490

491 A ScoreModelGrader object that uses a model to assign a score to the input.

492

493 - `label_model_grader: object { input, labels, model, 3 more }`

494

495 A LabelModelGrader object which uses a model to assign labels to each item

496 in the evaluation.

497

498 - `input: array of object { content, role, type }`

499

500 - `content: string or ResponseInputText or object { text, type } or 3 more`

501

502 Inputs to the model - can contain template strings. Supports text, output text, input images, and input audio, either as a single item or an array of items.

503

504 - `Text input: string`

505

506 A text input to the model.

507

508 - `response_input_text: object { text, type }`

509

510 A text input to the model.

511

512 - `Output text: object { text, type }`

513

514 A text output from the model.

515

516 - `text: string`

517

518 The text output from the model.

519

520 - `type: "output_text"`

521

522 The type of the output text. Always `output_text`.

523

524 - `Input image: object { image_url, type, detail }`

525

526 An image input block used within EvalItem content arrays.

527

528 - `image_url: string`

529

530 The URL of the image input.

531

532 - `type: "input_image"`

533

534 The type of the image input. Always `input_image`.

535

536 - `detail: optional string`

537

538 The detail level of the image to be sent to the model. One of `high`, `low`, or `auto`. Defaults to `auto`.

539

540 - `response_input_audio: object { input_audio, type }`

541

542 An audio input to the model.

543

544 - `grader_inputs: array of string or ResponseInputText or object { text, type } or 2 more`

545

546 A list of inputs, each of which may be either an input text, output text, input

547 image, or input audio object.

548

549 - `role: "user" or "assistant" or "system" or "developer"`

550

551 The role of the message input. One of `user`, `assistant`, `system`, or

552 `developer`.

553

554 - `"user"`

555

556 - `"assistant"`

557

558 - `"system"`

559

560 - `"developer"`

561

562 - `type: optional "message"`

563

564 The type of the message input. Always `message`.

565

566 - `"message"`

567

568 - `labels: array of string`

569

570 The labels to assign to each item in the evaluation.

571

572 - `model: string`

573

574 The model to use for the evaluation. Must support structured outputs.

575

576 - `name: string`

577

578 The name of the grader.

579

580 - `passing_labels: array of string`

581

582 The labels that indicate a passing result. Must be a subset of labels.

583

584 - `type: "label_model"`

585

586 The object type, which is always `label_model`.

587

588 - `name: string`

589

590 The name of the grader.

591

592 - `type: "multi"`

593

594 The object type, which is always `multi`.

595

596 - `hyperparameters: optional object { batch_size, compute_multiplier, eval_interval, 4 more }`

597

598 The hyperparameters used for the reinforcement fine-tuning job.

599

600 - `batch_size: optional "auto" or number`

601

602 Number of examples in each batch. A larger batch size means that model parameters are updated less frequently, but with lower variance.

603

604 - `union_member_0: "auto"`

605

606 - `union_member_1: number`

607

608 - `compute_multiplier: optional "auto" or number`

609

610 Multiplier on amount of compute used for exploring search space during training.

611

612 - `union_member_0: "auto"`

613

614 - `union_member_1: number`

615

616 - `eval_interval: optional "auto" or number`

617

618 The number of training steps between evaluation runs.

619

620 - `union_member_0: "auto"`

621

622 - `union_member_1: number`

623

624 - `eval_samples: optional "auto" or number`

625

626 Number of evaluation samples to generate per training step.

627

628 - `union_member_0: "auto"`

629

630 - `union_member_1: number`

631

632 - `learning_rate_multiplier: optional "auto" or number`

633

634 Scaling factor for the learning rate. A smaller learning rate may be useful to avoid overfitting.

635

636 - `union_member_0: "auto"`

637

638 - `union_member_1: number`

639

640 - `n_epochs: optional "auto" or number`

641

642 The number of epochs to train the model for. An epoch refers to one full cycle through the training dataset.

643

644 - `union_member_0: "auto"`

645

646 - `union_member_1: number`

647

648 - `reasoning_effort: optional "default" or "low" or "medium" or "high"`

649

650 Level of reasoning effort.

651

652 - `"default"`

653

654 - `"low"`

655

656 - `"medium"`

657

658 - `"high"`

659

660### Supervised Hyperparameters

661

662- `supervised_hyperparameters: object { batch_size, learning_rate_multiplier, n_epochs }`

663

664 The hyperparameters used for the fine-tuning job.

665

666 - `batch_size: optional "auto" or number`

667

668 Number of examples in each batch. A larger batch size means that model parameters are updated less frequently, but with lower variance.

669

670 - `union_member_0: "auto"`

671

672 - `union_member_1: number`

673

674 - `learning_rate_multiplier: optional "auto" or number`

675

676 Scaling factor for the learning rate. A smaller learning rate may be useful to avoid overfitting.

677

678 - `union_member_0: "auto"`

679

680 - `union_member_1: number`

681

682 - `n_epochs: optional "auto" or number`

683

684 The number of epochs to train the model for. An epoch refers to one full cycle through the training dataset.

685

686 - `union_member_0: "auto"`

687

688 - `union_member_1: number`

689

690### Supervised Method

691

692- `supervised_method: object { hyperparameters }`

693

694 Configuration for the supervised fine-tuning method.

695

696 - `hyperparameters: optional object { batch_size, learning_rate_multiplier, n_epochs }`

697

698 The hyperparameters used for the fine-tuning job.

699

700 - `batch_size: optional "auto" or number`

701

702 Number of examples in each batch. A larger batch size means that model parameters are updated less frequently, but with lower variance.

703

704 - `union_member_0: "auto"`

705

706 - `union_member_1: number`

707

708 - `learning_rate_multiplier: optional "auto" or number`

709

710 Scaling factor for the learning rate. A smaller learning rate may be useful to avoid overfitting.

711

712 - `union_member_0: "auto"`

713

714 - `union_member_1: number`

715

716 - `n_epochs: optional "auto" or number`

717

718 The number of epochs to train the model for. An epoch refers to one full cycle through the training dataset.

719

720 - `union_member_0: "auto"`

721

722 - `union_member_1: number`