Documentation — Spybara

Files

model-capabilities
- audio
  - voice-agent.md
- images
  - generation.md
- imagine.md
tools
- citations.md
- function-calling.md

model-capabilities/audio/voice-agent.md +37 −0

Details

163| `audio.input.transcription.language_hint` | string | BCP-47 language code (e.g. `"ja"`, `"ar"`, `"es-MX"`, `"pt-BR"`) to bias ASR transcription toward a specific language. Can be updated mid-session. See [Language Hint](#language-hint). |163| `audio.input.transcription.language_hint` | string | BCP-47 language code (e.g. `"ja"`, `"ar"`, `"es-MX"`, `"pt-BR"`) to bias ASR transcription toward a specific language. Can be updated mid-session. See [Language Hint](#language-hint). |

164| `audio.output.speed` | number | Playback speed multiplier for assistant audio output. Range: 0.7–1.5. Default: `1.0`. Values below 1.0 slow down speech; values above 1.0 speed it up. |164| `audio.output.speed` | number | Playback speed multiplier for assistant audio output. Range: 0.7–1.5. Default: `1.0`. Values below 1.0 slow down speech; values above 1.0 speed it up. |

165| `replace` | object | optional | Map of phrases to spoken substitutions applied to the model's output before TTS, e.g. `{"Acme Mobile": "Acme Mobull"}`. Fixes pronunciation by changing the spoken audio without altering the transcript. See [Pronunciation Replacements](#pronunciation-replacements). |

165 166

166## Available Voices167## Available Voices

167 168

416}417}

417```418```

418 419

420## Pronunciation Replacements

421

422Use the `replace` parameter to fix how the model pronounces specific words or phrases. Each key is matched (case-insensitively) in the model's output and swapped for its replacement value **before** text-to-speech — so only the spoken audio changes; the transcript the user sees keeps the original text.

423

424This is useful for brand names, acronyms, or domain terms the model mispronounces. For example, mapping `"Acme Mobile"` to `"Acme Mobull"` makes the audio say it correctly while the transcript still reads "Acme Mobile".

425

426```python customLanguage="pythonWithoutSDK"

427await ws.send(json.dumps({

428 "type": "session.update",

429 "session": {

430 "voice": "eve",

431 "instructions": "You are a helpful assistant.",

432 "replace": {"Acme Mobile": "Acme Mobull"}

433 }

434}))

435```

436

437```javascript customLanguage="javascriptWithoutSDK"

438ws.send(JSON.stringify({

439 type: "session.update",

440 session: {

441 voice: "eve",

442 instructions: "You are a helpful assistant.",

443 replace: { "Acme Mobile": "Acme Mobull" }

444 }

445}));

446```

447

448Matching behavior:

449

450* Matching is case-insensitive; the replacement is spoken using the casing you provide.

451* Whole-word boundaries are required, so `Acme, Mobile`, `Acme-Mobile`, and `Acme Mobiles` do **not** match.

452* When multiple keys share a prefix, the longest match wins.

453* The map can be updated mid-session with another `session.update`; the applied map is echoed back on `session.updated`.

454

419## Supported Languages455## Supported Languages

420 456

421The Voice Agent API supports 20+ languages with native-quality accents. The model automatically detects the input language and responds naturally in the same language — no configuration required.457The Voice Agent API supports 20+ languages with native-quality accents. The model automatically detects the input language and responds naturally in the same language — no configuration required.

1610|---|---|1646|---|---|

1611| `force_message` | New `conversation.item.create` item type for TTS-synthesized scripted utterances. See [Force Message](#force-message). |1647| `force_message` | New `conversation.item.create` item type for TTS-synthesized scripted utterances. See [Force Message](#force-message). |

1612| `resumption` | Field on `session.update` that caches conversation turns and replays them on reconnect. See [Session Resumption](#session-resumption). |1648| `resumption` | Field on `session.update` that caches conversation turns and replays them on reconnect. See [Session Resumption](#session-resumption). |

1649| `replace` | Field on `session.update` that maps phrases to spoken substitutions applied before TTS to fix pronunciation without changing the transcript. See [Pronunciation Replacements](#pronunciation-replacements). |

model-capabilities/images/generation.md +0 −4

Details

4 4

5Generate images from text prompts with Grok Imagine models. The API supports batch generation of multiple images, and control over aspect ratio and resolution.5Generate images from text prompts with Grok Imagine models. The API supports batch generation of multiple images, and control over aspect ratio and resolution.

6 6

~~7> [!WARNING]~~

~~8>~~

9> **`grok-imagine-image-pro` will be deprecated as of May 15, 2026.** Use `grok-imagine-image-quality` for all new image generation requests. Existing `-pro` requests will continue to work during a transition period, but we recommend migrating promptly.

11## Quick Start7## Quick Start

12 8

13Generate an image with a single API call:9Generate an image with a single API call:

model-capabilities/imagine.md +2 −2

Details

6 6

7## Pricing7## Pricing

8 8

9Image generation uses flat per-image pricing regardless of prompt length. Each generated image incurs a fixed fee. Image edits are billed for both the input image and the generated output image. Video generation uses per-second pricing where both duration and resolution affect the total cost. For full pricing details, see the [models page](/developers/models#imagine-pricing).9Image generation uses flat per-image pricing regardless of prompt length. Each generated image incurs a fixed fee. Image edits are billed for both the input image and the generated output image. Video generation uses per-second pricing where both duration and resolution affect the total cost. For full pricing details, see the [pricing page](/developers/pricing#imagine-api-pricing).

10 10

11## Image Generation11## Image Generation

12 12

193* **[Multi-Image Editing](/developers/model-capabilities/images/multi-image-editing)** — Combine up to 3 source images in a single edit for compositing subjects, transferring styles, and building scenes from multiple references.193* **[Multi-Image Editing](/developers/model-capabilities/images/multi-image-editing)** — Combine up to 3 source images in a single edit for compositing subjects, transferring styles, and building scenes from multiple references.

194* **[Video Generation](/developers/model-capabilities/video/generation)** — Generate videos from text prompts with configurable duration (up to 15s), aspect ratio, and resolution.194* **[Video Generation](/developers/model-capabilities/video/generation)** — Generate videos from text prompts with configurable duration (up to 15s), aspect ratio, and resolution.

195* **[Video Editing](/developers/model-capabilities/video/editing)** — Modify an existing video with a text prompt while preserving the rest of the scene.195* **[Video Editing](/developers/model-capabilities/video/editing)** — Modify an existing video with a text prompt while preserving the rest of the scene.

196* **[Reference-to-Video](/developers/model-capabilities/video/reference-to-video)** — Guide a generated video with one or more reference images that influence the output without forcing the first frame.196* **[Reference-to-Video](/developers/model-capabilities/video/reference-to-video)** — Guide a generated video with one or more reference images that influence the output without forcing the first frame. Requires `grok-imagine-video` — `grok-imagine-video-1.5` does not support this mode.

197* **[Video Extension](/developers/model-capabilities/video/extension)** — Continue an existing video from its last frame, combining the original and extension into one clip.197* **[Video Extension](/developers/model-capabilities/video/extension)** — Continue an existing video from its last frame, combining the original and extension into one clip.

198* **[Files API Integration](/developers/model-capabilities/imagine/files)** — Reference stored files as Imagine inputs by ID, persist generated assets to the Files API, and optionally create a permanent shareable public URL — all in a single request.198* **[Files API Integration](/developers/model-capabilities/imagine/files)** — Reference stored files as Imagine inputs by ID, persist generated assets to the Files API, and optionally create a permanent shareable public URL — all in a single request.

199 199

tools/citations.md +129 −50

Details

32 32

33**Important**: Enabling inline citations does not guarantee that the model will cite sources on every answer. The model decides when and where to include citations based on the context and nature of the query.33**Important**: Enabling inline citations does not guarantee that the model will cite sources on every answer. The model decides when and where to include citations based on the context and nature of the query.

34 34

~~35### Enabling Inline Citations~~35### Configuring Inline Citations

36 36

~~37Inline citations are returned by default with the Responses API. For the xAI SDK, you can explicitly request them with `include=["inline_citations"]`:~~37Inline citation behavior differs between the **Responses API** and the **xAI Python SDK** (gRPC chat API).

38 38

~~39```bash customLanguage="bash"~~39The **Responses API** behaviour applies to the following clients:

41* cURL against `/v1/responses`

42* Python (OpenAI SDK)

43* JavaScript (AI SDK via `xai.responses()`)

44* JavaScript (OpenAI SDK)

46| | Responses API(cURL, Python/JS OpenAI SDK, JS AI SDK) | xAI Python SDK |

47|---|---|---|

48| **Default** | Enabled — response text may include `[[N]](url)` links without extra configuration | Disabled — omit `include`, or do not pass `"inline_citations"` |

49| **Enable** | Enabled by default, no additional action needed. | Pass `include=["inline_citations"]` to the `chat.create()` method |

50| **Disable** | Pass `include=["no_inline_citations"]` | Disabled by default |

52When inline citations are disabled, the response text will not contain any `[[N]](url)` markdown links. The `annotations` field on `output_text` content blocks may still be present, but annotations only list sources encountered during search — they will not have positional references into the response text.

54#### Enabled (default for Responses API; opt-in for xAI Python SDK)

56```bash customLanguage="bash" highlightedLines="9"

40curl https://api.x.ai/v1/responses \57curl https://api.x.ai/v1/responses \

41 -H "Content-Type: application/json" \58 -H "Content-Type: application/json" \

42 -H "Authorization: Bearer $XAI_API_KEY" \59 -H "Authorization: Bearer $XAI_API_KEY" \

45 "input": [62 "input": [

46 {"role": "user", "content": "What is xAI?"}63 {"role": "user", "content": "What is xAI?"}

47 ],64 ],

~~48 "tools": [{"type": "web_search"}]~~65 "tools": [{"type": "web_search"}] // inline citations are enabled by default

49}'66}'

50```67```

51 68

~~52```python customLanguage="pythonXAI"~~69```python customLanguage="pythonXAI" highlightedLines="14"

53import os70import os

54 71

55from xai_sdk import Client72from xai_sdk import Client

63 web_search(),80 web_search(),

64 x_search(),81 x_search(),

65 ],82 ],

~~66 include=["inline_citations"], # Enable inline citations~~83 include=["inline_citations"], # Enable inline citations (opt-in for xAI Python SDK)

67)84)

68 85

69chat.append(user("What is xAI?"))86chat.append(user("What is xAI?"))

73print(response.content)90print(response.content)

74```91```

75 92

~~76```python customLanguage="pythonOpenAISDK"~~93```python customLanguage="pythonOpenAISDK" highlightedLines="15"

77import os94import os

78from openai import OpenAI95from openai import OpenAI

79 96

88 {"role": "user", "content": "What is xAI?"}105 {"role": "user", "content": "What is xAI?"}

89 ],106 ],

90 tools=[107 tools=[

~~91 {"type": "web_search"},~~108 {"type": "web_search"}, # inline citations are enabled by default

92 ],109 ],

93)110)

94 111

100 print(content.text)117 print(content.text)

101```118```

102 119

103```javascript customLanguage="javascriptAISDK"120```javascript customLanguage="javascriptAISDK" highlightedLines="8"

104import { xai } from '@ai-sdk/xai';121import { xai } from '@ai-sdk/xai';

105import { generateText } from 'ai';122import { generateText } from 'ai';

106 123

108 model: xai.responses('grok-4.3'),125 model: xai.responses('grok-4.3'),

109 prompt: 'What is xAI?',126 prompt: 'What is xAI?',

110 tools: {127 tools: {

111 web_search: xai.tools.webSearch(),128 web_search: xai.tools.webSearch(), // inline citations are enabled by default

112 },129 },

113});130});

114 131

119console.log('Sources:', sources);136console.log('Sources:', sources);

120```137```

121 138

122```javascript customLanguage="javascriptOpenAISDK"139```javascript customLanguage="javascriptOpenAISDK" highlightedLines="13"

123import OpenAI from 'openai';140import OpenAI from 'openai';

124 141

125const client = new OpenAI({142const client = new OpenAI({

132 input: [149 input: [

133 { role: 'user', content: 'What is xAI?' }150 { role: 'user', content: 'What is xAI?' }

134 ],151 ],

135 tools: [{ type: 'web_search' }],152 tools: [{ type: 'web_search' }], // inline citations are enabled by default

136});153});

137 154

138// Get the message with inline citations155// Get the message with inline citations

147}164}

148```165```

149 166

150### Markdown Citation Format167#### Disabled (opt-out for Responses API; default for xAI Python SDK)

~~151~~

152When inline citations are enabled, the model will insert markdown-style citation links directly into the response text:

~~153~~

154```output

155The latest announcements from xAI, primarily from their official X account (@xai) and website (x.ai/news), date back to November 19, 2025.[[1]](https://x.ai/news/)[[2]](https://x.ai/)[[3]](https://x.com/i/status/1991284813727474073)

156```

~~157~~

158When rendered as markdown, this displays as clickable links:

~~159~~

160> The latest announcements from xAI, primarily from their official X account (@xai) and website (x.ai/news), date back to November 19, 2025.[\[1\]](https://x.ai/news/)[\[2\]](https://x.ai/)[\[3\]](https://x.com/i/status/1991284813727474073)

~~161~~

162The format is `[[N]](url)` where:

~~163~~

~~164* `N` is the sequential display number for the citation **starting from 1**~~

165* `url` is the source URL

~~166~~

167**Citation numbering**: Citation numbers always start from 1 and increment sequentially. If the same source is cited again later in the response, the original citation number will be reused.

~~168~~

169### Image Embeds

~~170~~

171When `enable_image_search` is enabled on the `web_search` tool, Grok may embed image results as Markdown images instead of numbered text citations:

~~172~~

173```output

174Here are images of Starship on the launch pad:

175![Why the SpaceX Starship launch pad matters](https://www.astronomy.com/wp-content/uploads/2024/09/starship-test-flight-mission-scaled.jpg)

176```

~~177~~

178The format is `![alt](url)` where:

~~179~~

180* `alt` is a short description or title for the image

181* `url` is the image source URL

~~182~~

183### Disabling Inline Citations

~~184~~

185To disable inline citations in the Responses API, add `"no_inline_citations"` to the `include` field. For the xAI SDK, simply omit `"inline_citations"` from the `include` field (inline citations are opt-in for the xAI SDK).

~~186~~

187When disabled, the response text will not contain any `[[N]](url)` markdown links. The `annotations` field on `output_text` content blocks will still be present, but the annotations will only represent the sources that the tool encountered during the search — they will not have positional references into the response text.

188 168

189```python customLanguage="pythonOpenAISDK" highlightedLines="17"169```python customLanguage="pythonOpenAISDK" highlightedLines="17"

190import os170import os

256}'236}'

257```237```

258 238

239### Markdown Citation Format

240

241When inline citations are enabled, the model will insert markdown-style citation links directly into the response text:

242

243```output

244The latest announcements from xAI, primarily from their official X account (@xai) and website (x.ai/news), date back to November 19, 2025.[[1]](https://x.ai/news/)[[2]](https://x.ai/)[[3]](https://x.com/i/status/1991284813727474073)

245```

246

247When rendered as markdown, this displays as clickable links:

248

249> The latest announcements from xAI, primarily from their official X account (@xai) and website (x.ai/news), date back to November 19, 2025.[\[1\]](https://x.ai/news/)[\[2\]](https://x.ai/)[\[3\]](https://x.com/i/status/1991284813727474073)

250

251The format is `[[N]](url)` where:

252

253* `N` is the sequential display number for the citation **starting from 1**

254* `url` is the source URL

255

256**Citation numbering**: Citation numbers always start from 1 and increment sequentially. If the same source is cited again later in the response, the original citation number will be reused.

257

258### Image Embeds

259

260When `enable_image_search` is enabled on the `web_search` tool, Grok may embed image results as Markdown images instead of numbered text citations:

261

262```output

263Here are images of Starship on the launch pad:

264![Why the SpaceX Starship launch pad matters](https://www.astronomy.com/wp-content/uploads/2024/09/starship-test-flight-mission-scaled.jpg)

265```

266

267The format is `![alt](url)` where:

268

269* `alt` is a short description or title for the image

270* `url` is the image source URL

271

259## Accessing Structured Inline Citation Data272## Accessing Structured Inline Citation Data

260 273

261Structured inline citation data provides precise positional information about each citation in the response text.274Structured inline citation data provides precise positional information about each citation in the response text.

262 275

263### Response Format276### Response Format

264 277

278When inline citations are enabled, each `output_text` content block includes an `annotations` array with structured citation metadata (URL, character offsets, and label):

279

280```json highlightedLines="16-45"

281{

282 "created_at": 1781829888,

283 "completed_at": 1781829888,

284 "id": "5808284d-ae14-9981-9289-73515f67ebda",

285 "max_output_tokens": null,

286 "model": "grok-4.3",

287 "object": "response",

288 "output": [

289 ...

290 {

291 "content": [

292 {

293 "type": "output_text",

294 "text": "**xAI is an artificial intelligence company founded by Elon Musk in March 2023.** Its stated mission is to \"understand the universe\" by building advanced AI systems that accelerate human scientific discovery.[[1]](https://x.ai/company)\n\n### Key Details\n- **Flagship product**: Grok, a family of frontier AI models focused on reasoning, code, voice, image generation, and video. These are trained on massive infrastructure, including what the company describes as the world's largest supercluster (Colossus). Grok powers chatbots, APIs, and multimodal tools available via a unified API.[[2]](https://x.ai/)\n- **Current status (as of mid-2026)**: xAI operates as a subsidiary of SpaceX following an acquisition in February 2026. It is also connected to the X social platform (formerly Twitter), which xAI effectively became the parent of in 2025. The company has expanded into data centers and enterprise AI offerings (e.g., integrations with Amazon Bedrock and Databricks).[[3]](https://en.wikipedia.org/wiki/XAI_(company))\n- **Headquarters and team**: Based in the Stanford Research Park in Palo Alto, California. It was initially founded with a team of AI researchers and is led by Elon Musk as CEO.\n\nxAI positions itself as building maximally truth-seeking AI, distinct from other labs in its approach. Its official website (x.ai) highlights developer tools, API access, and ongoing model releases. Note that there is an unrelated blockchain/gaming project called Xai (xai.games), but the primary reference to \"xAI\" in this context is Musk's AI venture.[[4]](https://xai.games/)\n\nFor the latest updates, check x.ai or @xai on X.",

295 "logprobs": [],

296 "annotations": [

297 {

298 "type": "url_citation",

299 "url": "https://x.ai/company",

300 "start_index": 208,

301 "end_index": 235,

302 "title": "1"

303 },

304 {

305 "type": "url_citation",

306 "url": "https://x.ai/",

307 "start_index": 585,

308 "end_index": 605,

309 "title": "2"

310 },

311 {

312 "type": "url_citation",

313 "url": "https://en.wikipedia.org/wiki/XAI_(company)",

314 "start_index": 972,

315 "end_index": 1022,

316 "title": "3"

317 },

318 {

319 "type": "url_citation",

320 "url": "https://xai.games/",

321 "start_index": 1555,

322 "end_index": 1580,

323 "title": "4"

324 }

325 ]

326 }

327 ],

328 "id": "msg_5808284d-ae14-9981-9289-73515f67ebda",

329 "role": "assistant",

330 "type": "message",

331 "status": "completed"

332 }

333 ],

334 "parallel_tool_calls": true,

335 "previous_response_id": null,

336 "reasoning": {

337 "effort": "low",

338 "summary": "detailed"

339 },

340 ...

341}

342```

343

265Each citation annotation contains:344Each citation annotation contains:

266 345

tools/function-calling.md +25 −2

Details

407}407}

408```408```

409 409

410The root of a `parameters` schema must be an object (`"type": "object"`); nest any other types inside `properties`.410The root of a `parameters` schema must be an object (`"type": "object"`); nest any other types inside `properties`. A root `anyOf` or `oneOf` also works when every branch is itself an object, letting you define a tool that accepts one of several object variants:

411

412```json

413{

414 "oneOf": [

415 {

416 "type": "object",

417 "properties": {

418 "kind": { "const": "email" },

419 "address": { "type": "string" }

420 },

421 "required": ["kind", "address"]

422 },

423 {

424 "type": "object",

425 "properties": {

426 "kind": { "const": "sms" },

427 "phone": { "type": "string" }

428 },

429 "required": ["kind", "phone"]

430 }

431 ]

432}

433```

411 434

412> [!WARNING]435> [!WARNING]

413>436>

414> A tool whose `parameters` root is not an object (for example, a scalar or array) cannot be compiled into a tool-call grammar and is rejected with a `400` error that names the tool.437> A tool whose `parameters` root is neither an object nor a union of objects (for example, a scalar, an array, or an `anyOf`/`oneOf` with a non-object branch) cannot be compiled into a tool-call grammar and is rejected with a `400` error that names the tool.

415 438

416## Complete Vercel AI SDK Example439## Complete Vercel AI SDK Example

417 440

Documentation 2026-06-19 05:59 UTC to 2026-06-22 20:59 UTC