Function calling

Function calling (also known as tool calling) provides a powerful and flexible way for OpenAI models to interface with external systems and access data outside their training data. This guide shows how you can connect a model to data and actions provided by your application. We'll show how to use function tools (defined by a JSON schema) and custom tools which work with free form text inputs and outputs.

If your application has many functions or large schemas, you can pair function calling with tool search to defer rarely used tools and load them only when the model needs them. Only gpt-5.4 and later models support tool_search.

How it works

Let's begin by understanding a few key terms about tool calling. After we have a shared vocabulary for tool calling, we'll show you how it's done with some practical examples.

Tools - functionality we give the model

A function or tool refers in the abstract to a piece of functionality that we tell the model it has access to. As a model generates a response to a prompt, it may decide that it needs data or functionality provided by a tool to follow the prompt's instructions.

You could give the model access to tools that:

Get today's weather for a location
Access account details for a given user ID
Issue refunds for a lost order

Or anything else you'd like the model to be able to know or do as it responds to a prompt.

When we make an API request to the model with a prompt, we can include a list of tools the model could consider using. For example, if we wanted the model to be able to answer questions about the current weather somewhere in the world, we might give it access to a get_weather tool that takes location as an argument.

Tool calls - requests from the model to use tools

A function call or tool call refers to a special kind of response we can get from the model if it examines a prompt, and then determines that in order to follow the instructions in the prompt, it needs to call one of the tools we made available to it.

If the model receives a prompt like "what is the weather in Paris?" in an API request, it could respond to that prompt with a tool call for the get_weather tool, with Paris as the location argument.

Tool call outputs - output we generate for the model

A function call output or tool call output refers to the response a tool generates using the input from a model's tool call. The tool call output can either be structured JSON or plain text, and it should contain a reference to a specific model tool call (referenced by call_id in the examples to come). To complete our weather example:

The model has access to a get_weather tool that takes location as an argument.
In response to a prompt like "what's the weather in Paris?" the model returns a tool call that contains a location argument with a value of Paris
The tool call output might return a JSON object (e.g., {"temperature": "25", "unit": "C"}, indicating a current temperature of 25 degrees), Image contents, or File contents.

We then send all of the tool definition, the original prompt, the model's tool call, and the tool call output back to the model to finally receive a text response like:

The weather in Paris today is 25C.

Functions versus tools

A function is a specific kind of tool, defined by a JSON schema. A function definition allows the model to pass data to your application, where your code can access data or take actions suggested by the model.
In addition to function tools, there are custom tools (described in this guide) that work with free text inputs and outputs.
There are also built-in tools that are part of the OpenAI platform. These tools enable the model to search the web, execute code, access the functionality of an MCP server, and more.

The tool calling flow

Tool calling is a multi-step conversation between your application and a model via the OpenAI API. The tool calling flow has five high level steps:

Make a request to the model with tools it could call
Receive a tool call from the model
Execute code on the application side with input from the tool call
Make a second request to the model with the tool output
Receive a final response from the model (or more tool calls)

Function Calling Diagram Steps

Function tool example

Let's look at an end-to-end tool calling flow for a get_horoscope function that gets a daily horoscope for an astrological sign.

Note that for reasoning models like GPT-5 or o4-mini, any reasoning items returned in model responses with tool calls must also be passed back with tool call outputs.

Defining functions

Functions are usually declared in the tools parameter of each API request. With tool search, your application can also load deferred functions later in the interaction. Either way, each callable function uses the same schema shape. A function definition has the following properties:

Field	Description
`type`	This should always be `function`
`name`	The function's name (e.g. `get_weather`)
`description`	Details on when and how to use the function
`parameters`	JSON schema defining the function's input arguments
`strict`	Whether to enforce strict mode for the function call

Here is an example function definition for a get_weather function

{
  "type": "function",
  "name": "get_weather",
  "description": "Retrieves current weather for the given location.",
  "parameters": {
    "type": "object",
    "properties": {
      "location": {
        "type": "string",
        "description": "City and country e.g. Bogotá, Colombia"
      },
      "units": {
        "type": "string",
        "enum": ["celsius", "fahrenheit"],
        "description": "Units the temperature will be returned in."
      }
    },
    "required": ["location", "units"],
    "additionalProperties": false
  },
  "strict": true
}

Because the parameters are defined by a JSON schema, you can leverage many of its rich features like property types, enums, descriptions, nested objects, and, recursive objects.

Defining namespaces

Use namespaces to group related tools by domain, such as crm, billing, or shipping. Namespaces help organize similar tools and are especially useful when the model must choose between tools that serve different systems or purposes, such as one search tool for your CRM and another for your support ticketing system.

{
  "type": "namespace",
  "name": "crm",
  "description": "CRM tools for customer lookup and order management.",
  "tools": [
    {
      "type": "function",
      "name": "get_customer_profile",
      "description": "Fetch a customer profile by customer ID.",
      "parameters": {
        "type": "object",
        "properties": {
          "customer_id": { "type": "string" }
        },
        "required": ["customer_id"],
        "additionalProperties": false
      }
    },
    {
      "type": "function",
      "name": "list_open_orders",
      "description": "List open orders for a customer ID.",
      "defer_loading": true,
      "parameters": {
        "type": "object",
        "properties": {
          "customer_id": { "type": "string" }
        },
        "required": ["customer_id"],
        "additionalProperties": false
      }
    }
  ]
}

Tool search

If you need to give the model access to a large ecosystem of tools, you can defer loading some or all of those tools with tool_search. The tool_search tool lets the model search for relevant tools, add them to the model context, and then use them. Only gpt-5.4 and later models support it. Read the tool search guide to learn more.

Best practices for defining functions

Write clear and detailed function names, parameter descriptions, and instructions.
- Explicitly describe the purpose of the function and each parameter (and its format), and what the output represents.
- Use the system prompt to describe when (and when not) to use each function. Generally, tell the model exactly what to do.
- Include examples and edge cases, especially to rectify any recurring failures. (Note: Adding examples may hurt performance for reasoning models.)
- For deferred tools, put detailed guidance in the function description and keep the namespace description concise. The namespace helps the model choose what to load; the function description helps it use the loaded tool correctly.
Apply software engineering best practices.
- Make the functions obvious and intuitive. (principle of least surprise)
- Use enums and object structure to make invalid states unrepresentable. (e.g. toggle_light(on: bool, off: bool) allows for invalid calls)
- Pass the intern test. Can an intern/human correctly use the function given nothing but what you gave the model? (If not, what questions do they ask you? Add the answers to the prompt.)
Offload the burden from the model and use code where possible.
- Don't make the model fill arguments you already know. For example, if you already have an order_id based on a previous menu, don't have an order_id param – instead, have no params submit_refund() and pass the order_id with code.
- Combine functions that are always called in sequence. For example, if you always call mark_location() after query_location(), just move the marking logic into the query function call.
Keep the number of initially available functions small for higher accuracy.
- Evaluate your performance with different numbers of functions.
- Aim for fewer than 20 functions available at the start of a turn at any one time, though this is just a soft suggestion.
- Use tool search to defer large or infrequently used parts of your tool surface instead of exposing everything up front.
Leverage OpenAI resources.
- Generate and iterate on function schemas in the Playground.
- Consider fine-tuning to increase function calling accuracy for large numbers of functions or difficult tasks. (cookbook)

Token Usage

Under the hood, functions are injected into the system message in a syntax the model has been trained on. This means callable function definitions count against the model's context limit and are billed as input tokens. If you run into token limits, we suggest limiting the number of functions loaded up front, shortening descriptions where possible, or using tool search so deferred tools are loaded only when needed.

It is also possible to use fine-tuning to reduce the number of tokens used if you have many functions defined in your tools specification.

Handling function calls

When the model calls a function, you must execute it and return the result. Since model responses can include zero, one, or multiple calls, it is best practice to assume there are several.

The response output array contains an entry with the type having a value of function_call. Each entry with a call_id (used later to submit the function result), name, and JSON-encoded arguments.

If you are using tool search, you may also see tool_search_call and tool_search_output items before a function_call. Once the function is loaded, handle the function call in the same way shown here.

In the example above, we have a hypothetical call_function to route each call. Here’s a possible implementation:

Formatting results

The result you pass in the function_call_output message should typically be a string, where the format is up to you (JSON, error codes, plain text, etc.). The model will interpret that string as needed.

For functions that return images or files, you can pass an array of image or file objects instead of a string.

If your function has no return value (e.g. send_email), simply return a string that indicates success or failure. (e.g. "success")

Incorporating results into response

After appending the results to your input, you can send them back to the model to get a final response.

Additional configurations

Tool choice

By default the model will determine when and how many tools to use. You can force specific behavior with the tool_choice parameter.

Auto: (Default) Call zero, one, or multiple functions. tool_choice: "auto"
Required: Call one or more functions. tool_choice: "required"
Forced Function: Call exactly one specific function. tool_choice: {"type": "function", "name": "get_weather"}
Allowed tools: Restrict the tool calls the model can make to a subset of the tools available to the model.

When to use allowed_tools

You might want to configure an allowed_tools list in case you want to make only a subset of tools available across model requests, but not modify the list of tools you pass in, so you can maximize savings from prompt caching.

"tool_choice": {
    "type": "allowed_tools",
    "mode": "auto",
    "tools": [
        { "type": "function", "name": "get_weather" },
        { "type": "function", "name": "search_docs" }
    ]
  }
}

You can also set tool_choice to "none" to imitate the behavior of passing no functions.

When you use tool search, tool_choice still applies to the tools that are currently callable in the turn. This is most useful after you load a subset of tools and want to constrain the model to that subset.

Parallel function calling

Parallel function calling is not possible when using built-in tools.

The model may choose to call multiple functions in a single turn. You can prevent this by setting parallel_tool_calls to false, which ensures exactly zero or one tool is called.

Note: Currently, if you are using a fine tuned model and the model calls multiple functions in one turn then strict mode will be disabled for those calls.

Note for gpt-4.1-nano-2025-04-14: This snapshot of gpt-4.1-nano can sometimes include multiple tools calls for the same tool if parallel tool calls are enabled. It is recommended to disable this feature when using this nano snapshot.

Strict mode

Setting strict to true will ensure function calls reliably adhere to the function schema, instead of being best effort. We recommend always enabling strict mode.

Under the hood, strict mode works by leveraging our structured outputs feature and therefore introduces a couple requirements:

additionalProperties must be set to false for each object in the parameters.
All fields in properties must be marked as required.

You can denote optional fields by adding null as a type option (see example below).

If you send strict: true and your schema does not meet the requirements above, the request will be rejected with details about the missing constraints. If you omit strict, the default depends on the API: Responses requests will attempt to normalize your schema into strict mode when possible, and will fall back to non-strict, best-effort function calling if the schema cannot be made compatible with strict mode. When fallback happens, the response tool will show strict: false. Chat Completions requests remain non-strict by default. To opt out of strict mode in Responses and keep non-strict, best-effort function calling, explicitly set strict: false.

All schemas generated in the playground have strict mode enabled.

While we recommend you enable strict mode, it has a few limitations:

Some features of JSON schema are not supported. (See supported schemas.)

Specifically for fine tuned models:

Schemas undergo additional processing on the first request (and are then cached). If your schemas vary from request to request, this may result in higher latencies.
Schemas are cached for performance, and are not eligible for zero data retention.

Streaming

Streaming can be used to surface progress by showing which function is called as the model fills its arguments, and even displaying the arguments in real time.

Streaming function calls is very similar to streaming regular responses: you set stream to true and get different event objects.

Instead of aggregating chunks into a single content string, however, you're aggregating chunks into an encoded arguments JSON object.

When the model calls one or more functions an event of type response.output_item.added will be emitted for each function call that contains the following fields:

Field	Description
`response_id`	The id of the response that the function call belongs to
`output_index`	The index of the output item in the response. This represents the individual function calls in the response.
`item`	The in-progress function call item that includes a `name`, `arguments` and `id` field

Afterwards you will receive a series of events of type response.function_call_arguments.delta which will contain the delta of the arguments field. These events contain the following fields:

Field	Description
`response_id`	The id of the response that the function call belongs to
`item_id`	The id of the function call item that the delta belongs to
`output_index`	The index of the output item in the response. This represents the individual function calls in the response.
`delta`	The delta of the `arguments` field.

Below is a code snippet demonstrating how to aggregate the deltas into a final tool_call object.

When the model has finished calling the functions an event of type response.function_call_arguments.done will be emitted. This event contains the entire function call including the following fields:

Field	Description
`response_id`	The id of the response that the function call belongs to
`output_index`	The index of the output item in the response. This represents the individual function calls in the response.
`item`	The function call item that includes a `name`, `arguments` and `id` field.

Custom tools

Custom tools work in much the same way as JSON schema-driven function tools. But rather than providing the model explicit instructions on what input your tool requires, the model can pass an arbitrary string back to your tool as input. This is useful to avoid unnecessarily wrapping a response in JSON, or to apply a custom grammar to the response (more on this below).

The following code sample shows creating a custom tool that expects to receive a string of text containing Python code as a response.

Just as before, the output array will contain a tool call generated by the model. Except this time, the tool call input is given as plain text.

[
  {
    "id": "rs_6890e972fa7c819ca8bc561526b989170694874912ae0ea6",
    "type": "reasoning",
    "content": [],
    "summary": []
  },
  {
    "id": "ctc_6890e975e86c819c9338825b3e1994810694874912ae0ea6",
    "type": "custom_tool_call",
    "status": "completed",
    "call_id": "call_aGiFQkRWSWAIsMQ19fKqxUgb",
    "input": "print(\"hello world\")",
    "name": "code_exec"
  }
]

Context-free grammars

A context-free grammar (CFG) is a set of rules that define how to produce valid text in a given format. For custom tools, you can provide a CFG that will constrain the model's text input for a custom tool.

You can provide a custom CFG using the grammar parameter when configuring a custom tool. Currently, we support two CFG syntaxes when defining grammars: lark and regex.

Lark CFG

The output from the tool should then conform to the Lark CFG that you defined:

[
  {
    "id": "rs_6890ed2b6374819dbbff5353e6664ef103f4db9848be4829",
    "type": "reasoning",
    "content": [],
    "summary": []
  },
  {
    "id": "ctc_6890ed2f32e8819daa62bef772b8c15503f4db9848be4829",
    "type": "custom_tool_call",
    "status": "completed",
    "call_id": "call_pmlLjmvG33KJdyVdC4MVdk5N",
    "input": "4 + 4",
    "name": "math_exp"
  }
]

Grammars are specified using a variation of Lark. Model sampling is constrained using LLGuidance. Some features of Lark are not supported:

Lookarounds in lexer regexes
Lazy modifiers (*?, +?, ??) in lexer regexes
Priorities of terminals
Templates
Imports (other than built-in %import common)
%declares

We recommend using the Lark IDE to experiment with custom grammars.

Keep grammars simple

Try to make your grammar as simple as possible. The OpenAI API may return an error if the grammar is too complex, so you should ensure that your desired grammar is compatible before using it in the API.

Lark grammars can be tricky to perfect. While simple grammars perform most reliably, complex grammars often require iteration on the grammar definition itself, the prompt, and the tool description to ensure that the model does not go out of distribution.

Correct versus incorrect patterns

Correct (single, bounded terminal):

start: SENTENCE
SENTENCE: /[A-Za-z, ]*(the hero|a dragon|an old man|the princess)[A-Za-z, ]*(fought|saved|found|lost)[A-Za-z, ]*(a treasure|the kingdom|a secret|his way)[A-Za-z, ]*\./

Do NOT do this (splitting across rules/terminals). This attempts to let rules partition free text between terminals. The lexer will greedily match the free-text pieces and you'll lose control:

start: sentence
sentence: /[A-Za-z, ]+/ subject /[A-Za-z, ]+/ verb /[A-Za-z, ]+/ object /[A-Za-z, ]+/

Lowercase rules don't influence how terminals are cut from the input—only terminal definitions do. When you need “free text between anchors,” make it one giant regex terminal so the lexer matches it exactly once with the structure you intend.

Terminals versus rules

Lark uses terminals for lexer tokens (by convention, UPPERCASE) and rules for parser productions (by convention, lowercase). The most practical way to stay within the supported subset and avoid surprises is to keep your grammar simple and explicit, and to use terminals and rules with a clear separation of concerns.

The regex syntax used by terminals is the Rust regex crate syntax, not Python's re module.

Key ideas and best practices

Lexer runs before the parser

Terminals are matched by the lexer (greedily / longest match wins) before any CFG rule logic is applied. If you try to "shape" a terminal by splitting it across several rules, the lexer cannot be guided by those rules—only by terminal regexes.

Prefer one terminal when you're carving text out of freeform spans

If you need to recognize a pattern embedded in arbitrary text (e.g., natural language with “anything” between anchors), express that as a single terminal. Do not try to interleave free‑text terminals with parser rules; the greedy lexer will not respect your intended boundaries and it is highly likely the model will go out of distribution.

Use rules to compose discrete tokens

Rules are ideal when you're combining clearly delimited terminals (numbers, keywords, punctuation) into larger structures. They're not the right tool for constraining "the stuff in between" two terminals.

Keep terminals simple, bounded, and self-contained

Favor explicit character classes and bounded quantifiers ({0,10}, not unbounded * everywhere). If you need "any text up to a period", prefer something like /[^.\n]{0,10}*\./ rather than /.+\./ to avoid runaway growth.

Use rules to combine tokens, not to steer regex internals

Good rule usage example:

start: expr
NUMBER: /[0-9]+/
PLUS: "+"
MINUS: "-"
expr: term (("+"|"-") term)*
term: NUMBER

Treat whitespace explicitly

Don't rely on open-ended %ignore directives. Using unbounded ignore directives may cause the grammar to be too complex and/or may cause the model to go out of distribution. Prefer threading explicit terminals wherever whitespace is allowed.

Troubleshooting

If the API rejects the grammar because it is too complex, simplify the rules and terminals and remove unbounded %ignores.
If custom tools are called with unexpected tokens, confirm terminals aren’t overlapping; check greedy lexer.
When the model drifts "out‑of‑distribution" (shows up as the model producing excessively long or repetitive outputs, it is syntactically valid but is semantically wrong):
- Tighten the grammar.
- Iterate on the prompt (add few-shot examples) and tool description (explain the grammar and instruct the model to reason and conform to it).
- Experiment with a higher reasoning effort (e.g, bump from medium to high).

Regex CFG

The output from the tool should then conform to the Regex CFG that you defined:

[
  {
    "id": "rs_6894f7a3dd4c81a1823a723a00bfa8710d7962f622d1c260",
    "type": "reasoning",
    "content": [],
    "summary": []
  },
  {
    "id": "ctc_6894f7ad7fb881a1bffa1f377393b1a40d7962f622d1c260",
    "type": "custom_tool_call",
    "status": "completed",
    "call_id": "call_8m4XCnYvEmFlzHgDHbaOCFlK",
    "input": "August 7th 2025 at 10AM",
    "name": "timestamp"
  }
]

As with the Lark syntax, regexes use the Rust regex crate syntax, not Python's re module.

Some features of Regex are not supported:

Lookarounds
Lazy modifiers (*?, +?, ??)

Key ideas and best practices

Pattern must be on one line

If you need to match a newline in the input, use the escaped sequence \n. Do not use verbose/extended mode, which allows patterns to span multiple lines.

Provide the regex as a plain pattern string

Don't enclose the pattern in //.

guides/function-calling.md 2026-06-05 06:45 UTC to 2026-06-09 06:34 UTC

Function calling

How it works

The tool calling flow

Function tool example

Defining functions

Defining namespaces

Tool search

Best practices for defining functions

Token Usage

Handling function calls

Formatting results

Incorporating results into response

Additional configurations

Tool choice

Parallel function calling

Strict mode

Streaming

Custom tools

Context-free grammars

Lark CFG

Keep grammars simple

Correct versus incorrect patterns

Terminals versus rules

Key ideas and best practices

Troubleshooting

Regex CFG

Key ideas and best practices

guides/function-calling.md +8 −7