SpyBara
Go Premium

Documentation 2026-06-19 05:59 UTC to 2026-06-22 20:59 UTC

5 files changed +193 −58. View all changes and history on the product overview
2026
Sat 27 00:02 Wed 24 22:02 Mon 22 20:59 Fri 19 05:59 Thu 18 00:57 Wed 17 15:58 Mon 15 23:02 Sun 14 22:02 Thu 11 10:57
Details

162| `audio.output.format.rate` | number | Output sample rate (PCM only): 8000, 16000, 22050, 24000, 32000, 44100, 48000 |162| `audio.output.format.rate` | number | Output sample rate (PCM only): 8000, 16000, 22050, 24000, 32000, 44100, 48000 |

163| `audio.input.transcription.language_hint` | string | BCP-47 language code (e.g. `"ja"`, `"ar"`, `"es-MX"`, `"pt-BR"`) to bias ASR transcription toward a specific language. Can be updated mid-session. See [Language Hint](#language-hint). |163| `audio.input.transcription.language_hint` | string | BCP-47 language code (e.g. `"ja"`, `"ar"`, `"es-MX"`, `"pt-BR"`) to bias ASR transcription toward a specific language. Can be updated mid-session. See [Language Hint](#language-hint). |

164| `audio.output.speed` | number | Playback speed multiplier for assistant audio output. Range: 0.7–1.5. Default: `1.0`. Values below 1.0 slow down speech; values above 1.0 speed it up. |164| `audio.output.speed` | number | Playback speed multiplier for assistant audio output. Range: 0.7–1.5. Default: `1.0`. Values below 1.0 slow down speech; values above 1.0 speed it up. |

165| `replace` | object | optional | Map of phrases to spoken substitutions applied to the model's output before TTS, e.g. `{"Acme Mobile": "Acme Mobull"}`. Fixes pronunciation by changing the spoken audio without altering the transcript. See [Pronunciation Replacements](#pronunciation-replacements). |

165 166 

166## Available Voices167## Available Voices

167 168 


416}417}

417```418```

418 419 

420## Pronunciation Replacements

421 

422Use the `replace` parameter to fix how the model pronounces specific words or phrases. Each key is matched (case-insensitively) in the model's output and swapped for its replacement value **before** text-to-speech — so only the spoken audio changes; the transcript the user sees keeps the original text.

423 

424This is useful for brand names, acronyms, or domain terms the model mispronounces. For example, mapping `"Acme Mobile"` to `"Acme Mobull"` makes the audio say it correctly while the transcript still reads "Acme Mobile".

425 

426```python customLanguage="pythonWithoutSDK"

427await ws.send(json.dumps({

428 "type": "session.update",

429 "session": {

430 "voice": "eve",

431 "instructions": "You are a helpful assistant.",

432 "replace": {"Acme Mobile": "Acme Mobull"}

433 }

434}))

435```

436 

437```javascript customLanguage="javascriptWithoutSDK"

438ws.send(JSON.stringify({

439 type: "session.update",

440 session: {

441 voice: "eve",

442 instructions: "You are a helpful assistant.",

443 replace: { "Acme Mobile": "Acme Mobull" }

444 }

445}));

446```

447 

448Matching behavior:

449 

450* Matching is case-insensitive; the replacement is spoken using the casing you provide.

451* Whole-word boundaries are required, so `Acme, Mobile`, `Acme-Mobile`, and `Acme Mobiles` do **not** match.

452* When multiple keys share a prefix, the longest match wins.

453* The map can be updated mid-session with another `session.update`; the applied map is echoed back on `session.updated`.

454 

419## Supported Languages455## Supported Languages

420 456 

421The Voice Agent API supports 20+ languages with native-quality accents. The model automatically detects the input language and responds naturally in the same language — no configuration required.457The Voice Agent API supports 20+ languages with native-quality accents. The model automatically detects the input language and responds naturally in the same language — no configuration required.


1610|---|---|1646|---|---|

1611| `force_message` | New `conversation.item.create` item type for TTS-synthesized scripted utterances. See [Force Message](#force-message). |1647| `force_message` | New `conversation.item.create` item type for TTS-synthesized scripted utterances. See [Force Message](#force-message). |

1612| `resumption` | Field on `session.update` that caches conversation turns and replays them on reconnect. See [Session Resumption](#session-resumption). |1648| `resumption` | Field on `session.update` that caches conversation turns and replays them on reconnect. See [Session Resumption](#session-resumption). |

1649| `replace` | Field on `session.update` that maps phrases to spoken substitutions applied before TTS to fix pronunciation without changing the transcript. See [Pronunciation Replacements](#pronunciation-replacements). |

Details

4 4 

5Generate images from text prompts with Grok Imagine models. The API supports batch generation of multiple images, and control over aspect ratio and resolution.5Generate images from text prompts with Grok Imagine models. The API supports batch generation of multiple images, and control over aspect ratio and resolution.

6 6 

7> [!WARNING]

8>

9> **`grok-imagine-image-pro` will be deprecated as of May 15, 2026.** Use `grok-imagine-image-quality` for all new image generation requests. Existing `-pro` requests will continue to work during a transition period, but we recommend migrating promptly.

10 

11## Quick Start7## Quick Start

12 8 

13Generate an image with a single API call:9Generate an image with a single API call:

Details

6 6 

7## Pricing7## Pricing

8 8 

9Image generation uses flat per-image pricing regardless of prompt length. Each generated image incurs a fixed fee. Image edits are billed for both the input image and the generated output image. Video generation uses per-second pricing where both duration and resolution affect the total cost. For full pricing details, see the [models page](/developers/models#imagine-pricing).9Image generation uses flat per-image pricing regardless of prompt length. Each generated image incurs a fixed fee. Image edits are billed for both the input image and the generated output image. Video generation uses per-second pricing where both duration and resolution affect the total cost. For full pricing details, see the [pricing page](/developers/pricing#imagine-api-pricing).

10 10 

11## Image Generation11## Image Generation

12 12 


193* **[Multi-Image Editing](/developers/model-capabilities/images/multi-image-editing)** — Combine up to 3 source images in a single edit for compositing subjects, transferring styles, and building scenes from multiple references.193* **[Multi-Image Editing](/developers/model-capabilities/images/multi-image-editing)** — Combine up to 3 source images in a single edit for compositing subjects, transferring styles, and building scenes from multiple references.

194* **[Video Generation](/developers/model-capabilities/video/generation)** — Generate videos from text prompts with configurable duration (up to 15s), aspect ratio, and resolution.194* **[Video Generation](/developers/model-capabilities/video/generation)** — Generate videos from text prompts with configurable duration (up to 15s), aspect ratio, and resolution.

195* **[Video Editing](/developers/model-capabilities/video/editing)** — Modify an existing video with a text prompt while preserving the rest of the scene.195* **[Video Editing](/developers/model-capabilities/video/editing)** — Modify an existing video with a text prompt while preserving the rest of the scene.

196* **[Reference-to-Video](/developers/model-capabilities/video/reference-to-video)** — Guide a generated video with one or more reference images that influence the output without forcing the first frame.196* **[Reference-to-Video](/developers/model-capabilities/video/reference-to-video)** — Guide a generated video with one or more reference images that influence the output without forcing the first frame. Requires `grok-imagine-video` — `grok-imagine-video-1.5` does not support this mode.

197* **[Video Extension](/developers/model-capabilities/video/extension)** — Continue an existing video from its last frame, combining the original and extension into one clip.197* **[Video Extension](/developers/model-capabilities/video/extension)** — Continue an existing video from its last frame, combining the original and extension into one clip.

198* **[Files API Integration](/developers/model-capabilities/imagine/files)** — Reference stored files as Imagine inputs by ID, persist generated assets to the Files API, and optionally create a permanent shareable public URL — all in a single request.198* **[Files API Integration](/developers/model-capabilities/imagine/files)** — Reference stored files as Imagine inputs by ID, persist generated assets to the Files API, and optionally create a permanent shareable public URL — all in a single request.

199 199 

tools/citations.md +129 −50

Details

32 32 

33**Important**: Enabling inline citations does not guarantee that the model will cite sources on every answer. The model decides when and where to include citations based on the context and nature of the query.33**Important**: Enabling inline citations does not guarantee that the model will cite sources on every answer. The model decides when and where to include citations based on the context and nature of the query.

34 34 

35### Enabling Inline Citations35### Configuring Inline Citations

36 36 

37Inline citations are returned by default with the Responses API. For the xAI SDK, you can explicitly request them with `include=["inline_citations"]`:37Inline citation behavior differs between the **Responses API** and the **xAI Python SDK** (gRPC chat API).

38 38 

39```bash customLanguage="bash"39The **Responses API** behaviour applies to the following clients:

40 

41* cURL against `/v1/responses`

42* Python (OpenAI SDK)

43* JavaScript (AI SDK via `xai.responses()`)

44* JavaScript (OpenAI SDK)

45 

46| | Responses API(cURL, Python/JS OpenAI SDK, JS AI SDK) | xAI Python SDK |

47|---|---|---|

48| **Default** | Enabled — response text may include `[[N]](url)` links without extra configuration | Disabled — omit `include`, or do not pass `"inline_citations"` |

49| **Enable** | Enabled by default, no additional action needed. | Pass `include=["inline_citations"]` to the `chat.create()` method |

50| **Disable** | Pass `include=["no_inline_citations"]` | Disabled by default |

51 

52When inline citations are disabled, the response text will not contain any `[[N]](url)` markdown links. The `annotations` field on `output_text` content blocks may still be present, but annotations only list sources encountered during search — they will not have positional references into the response text.

53 

54#### Enabled (default for Responses API; opt-in for xAI Python SDK)

55 

56```bash customLanguage="bash" highlightedLines="9"

40curl https://api.x.ai/v1/responses \57curl https://api.x.ai/v1/responses \

41 -H "Content-Type: application/json" \58 -H "Content-Type: application/json" \

42 -H "Authorization: Bearer $XAI_API_KEY" \59 -H "Authorization: Bearer $XAI_API_KEY" \


45 "input": [62 "input": [

46 {"role": "user", "content": "What is xAI?"}63 {"role": "user", "content": "What is xAI?"}

47 ],64 ],

48 "tools": [{"type": "web_search"}]65 "tools": [{"type": "web_search"}] // inline citations are enabled by default

49}'66}'

50```67```

51 68 

52```python customLanguage="pythonXAI"69```python customLanguage="pythonXAI" highlightedLines="14"

53import os70import os

54 71 

55from xai_sdk import Client72from xai_sdk import Client


63 web_search(),80 web_search(),

64 x_search(),81 x_search(),

65 ],82 ],

66 include=["inline_citations"], # Enable inline citations83 include=["inline_citations"], # Enable inline citations (opt-in for xAI Python SDK)

67)84)

68 85 

69chat.append(user("What is xAI?"))86chat.append(user("What is xAI?"))


73print(response.content)90print(response.content)

74```91```

75 92 

76```python customLanguage="pythonOpenAISDK"93```python customLanguage="pythonOpenAISDK" highlightedLines="15"

77import os94import os

78from openai import OpenAI95from openai import OpenAI

79 96 


88 {"role": "user", "content": "What is xAI?"}105 {"role": "user", "content": "What is xAI?"}

89 ],106 ],

90 tools=[107 tools=[

91 {"type": "web_search"},108 {"type": "web_search"}, # inline citations are enabled by default

92 ],109 ],

93)110)

94 111 


100 print(content.text)117 print(content.text)

101```118```

102 119 

103```javascript customLanguage="javascriptAISDK"120```javascript customLanguage="javascriptAISDK" highlightedLines="8"

104import { xai } from '@ai-sdk/xai';121import { xai } from '@ai-sdk/xai';

105import { generateText } from 'ai';122import { generateText } from 'ai';

106 123 


108 model: xai.responses('grok-4.3'),125 model: xai.responses('grok-4.3'),

109 prompt: 'What is xAI?',126 prompt: 'What is xAI?',

110 tools: {127 tools: {

111 web_search: xai.tools.webSearch(),128 web_search: xai.tools.webSearch(), // inline citations are enabled by default

112 },129 },

113});130});

114 131 


119console.log('Sources:', sources);136console.log('Sources:', sources);

120```137```

121 138 

122```javascript customLanguage="javascriptOpenAISDK"139```javascript customLanguage="javascriptOpenAISDK" highlightedLines="13"

123import OpenAI from 'openai';140import OpenAI from 'openai';

124 141 

125const client = new OpenAI({142const client = new OpenAI({


132 input: [149 input: [

133 { role: 'user', content: 'What is xAI?' }150 { role: 'user', content: 'What is xAI?' }

134 ],151 ],

135 tools: [{ type: 'web_search' }],152 tools: [{ type: 'web_search' }], // inline citations are enabled by default

136});153});

137 154 

138// Get the message with inline citations155// Get the message with inline citations


147}164}

148```165```

149 166 

150### Markdown Citation Format167#### Disabled (opt-out for Responses API; default for xAI Python SDK)

151 

152When inline citations are enabled, the model will insert markdown-style citation links directly into the response text:

153 

154```output

155The latest announcements from xAI, primarily from their official X account (@xai) and website (x.ai/news), date back to November 19, 2025.[[1]](https://x.ai/news/)[[2]](https://x.ai/)[[3]](https://x.com/i/status/1991284813727474073)

156```

157 

158When rendered as markdown, this displays as clickable links:

159 

160> The latest announcements from xAI, primarily from their official X account (@xai) and website (x.ai/news), date back to November 19, 2025.[\[1\]](https://x.ai/news/)[\[2\]](https://x.ai/)[\[3\]](https://x.com/i/status/1991284813727474073)

161 

162The format is `[[N]](url)` where:

163 

164* `N` is the sequential display number for the citation **starting from 1**

165* `url` is the source URL

166 

167**Citation numbering**: Citation numbers always start from 1 and increment sequentially. If the same source is cited again later in the response, the original citation number will be reused.

168 

169### Image Embeds

170 

171When `enable_image_search` is enabled on the `web_search` tool, Grok may embed image results as Markdown images instead of numbered text citations:

172 

173```output

174Here are images of Starship on the launch pad:

175![Why the SpaceX Starship launch pad matters](https://www.astronomy.com/wp-content/uploads/2024/09/starship-test-flight-mission-scaled.jpg)

176```

177 

178The format is `![alt](url)` where:

179 

180* `alt` is a short description or title for the image

181* `url` is the image source URL

182 

183### Disabling Inline Citations

184 

185To disable inline citations in the Responses API, add `"no_inline_citations"` to the `include` field. For the xAI SDK, simply omit `"inline_citations"` from the `include` field (inline citations are opt-in for the xAI SDK).

186 

187When disabled, the response text will not contain any `[[N]](url)` markdown links. The `annotations` field on `output_text` content blocks will still be present, but the annotations will only represent the sources that the tool encountered during the search — they will not have positional references into the response text.

188 168 

189```python customLanguage="pythonOpenAISDK" highlightedLines="17"169```python customLanguage="pythonOpenAISDK" highlightedLines="17"

190import os170import os


256}'236}'

257```237```

258 238 

239### Markdown Citation Format

240 

241When inline citations are enabled, the model will insert markdown-style citation links directly into the response text:

242 

243```output

244The latest announcements from xAI, primarily from their official X account (@xai) and website (x.ai/news), date back to November 19, 2025.[[1]](https://x.ai/news/)[[2]](https://x.ai/)[[3]](https://x.com/i/status/1991284813727474073)

245```

246 

247When rendered as markdown, this displays as clickable links:

248 

249> The latest announcements from xAI, primarily from their official X account (@xai) and website (x.ai/news), date back to November 19, 2025.[\[1\]](https://x.ai/news/)[\[2\]](https://x.ai/)[\[3\]](https://x.com/i/status/1991284813727474073)

250 

251The format is `[[N]](url)` where:

252 

253* `N` is the sequential display number for the citation **starting from 1**

254* `url` is the source URL

255 

256**Citation numbering**: Citation numbers always start from 1 and increment sequentially. If the same source is cited again later in the response, the original citation number will be reused.

257 

258### Image Embeds

259 

260When `enable_image_search` is enabled on the `web_search` tool, Grok may embed image results as Markdown images instead of numbered text citations:

261 

262```output

263Here are images of Starship on the launch pad:

264![Why the SpaceX Starship launch pad matters](https://www.astronomy.com/wp-content/uploads/2024/09/starship-test-flight-mission-scaled.jpg)

265```

266 

267The format is `![alt](url)` where:

268 

269* `alt` is a short description or title for the image

270* `url` is the image source URL

271 

259## Accessing Structured Inline Citation Data272## Accessing Structured Inline Citation Data

260 273 

261Structured inline citation data provides precise positional information about each citation in the response text.274Structured inline citation data provides precise positional information about each citation in the response text.

262 275 

263### Response Format276### Response Format

264 277 

278When inline citations are enabled, each `output_text` content block includes an `annotations` array with structured citation metadata (URL, character offsets, and label):

279 

280```json highlightedLines="16-45"

281{

282 "created_at": 1781829888,

283 "completed_at": 1781829888,

284 "id": "5808284d-ae14-9981-9289-73515f67ebda",

285 "max_output_tokens": null,

286 "model": "grok-4.3",

287 "object": "response",

288 "output": [

289 ...

290 {

291 "content": [

292 {

293 "type": "output_text",

294 "text": "**xAI is an artificial intelligence company founded by Elon Musk in March 2023.** Its stated mission is to \"understand the universe\" by building advanced AI systems that accelerate human scientific discovery.[[1]](https://x.ai/company)\n\n### Key Details\n- **Flagship product**: Grok, a family of frontier AI models focused on reasoning, code, voice, image generation, and video. These are trained on massive infrastructure, including what the company describes as the world's largest supercluster (Colossus). Grok powers chatbots, APIs, and multimodal tools available via a unified API.[[2]](https://x.ai/)\n- **Current status (as of mid-2026)**: xAI operates as a subsidiary of SpaceX following an acquisition in February 2026. It is also connected to the X social platform (formerly Twitter), which xAI effectively became the parent of in 2025. The company has expanded into data centers and enterprise AI offerings (e.g., integrations with Amazon Bedrock and Databricks).[[3]](https://en.wikipedia.org/wiki/XAI_(company))\n- **Headquarters and team**: Based in the Stanford Research Park in Palo Alto, California. It was initially founded with a team of AI researchers and is led by Elon Musk as CEO.\n\nxAI positions itself as building maximally truth-seeking AI, distinct from other labs in its approach. Its official website (x.ai) highlights developer tools, API access, and ongoing model releases. Note that there is an unrelated blockchain/gaming project called Xai (xai.games), but the primary reference to \"xAI\" in this context is Musk's AI venture.[[4]](https://xai.games/)\n\nFor the latest updates, check x.ai or @xai on X.",

295 "logprobs": [],

296 "annotations": [

297 {

298 "type": "url_citation",

299 "url": "https://x.ai/company",

300 "start_index": 208,

301 "end_index": 235,

302 "title": "1"

303 },

304 {

305 "type": "url_citation",

306 "url": "https://x.ai/",

307 "start_index": 585,

308 "end_index": 605,

309 "title": "2"

310 },

311 {

312 "type": "url_citation",

313 "url": "https://en.wikipedia.org/wiki/XAI_(company)",

314 "start_index": 972,

315 "end_index": 1022,

316 "title": "3"

317 },

318 {

319 "type": "url_citation",

320 "url": "https://xai.games/",

321 "start_index": 1555,

322 "end_index": 1580,

323 "title": "4"

324 }

325 ]

326 }

327 ],

328 "id": "msg_5808284d-ae14-9981-9289-73515f67ebda",

329 "role": "assistant",

330 "type": "message",

331 "status": "completed"

332 }

333 ],

334 "parallel_tool_calls": true,

335 "previous_response_id": null,

336 "reasoning": {

337 "effort": "low",

338 "summary": "detailed"

339 },

340 ...

341}

342```

343 

265Each citation annotation contains:344Each citation annotation contains:

266 345 

267| Field | Type | Description |346| Field | Type | Description |

Details

407}407}

408```408```

409 409 

410The root of a `parameters` schema must be an object (`"type": "object"`); nest any other types inside `properties`.410The root of a `parameters` schema must be an object (`"type": "object"`); nest any other types inside `properties`. A root `anyOf` or `oneOf` also works when every branch is itself an object, letting you define a tool that accepts one of several object variants:

411 

412```json

413{

414 "oneOf": [

415 {

416 "type": "object",

417 "properties": {

418 "kind": { "const": "email" },

419 "address": { "type": "string" }

420 },

421 "required": ["kind", "address"]

422 },

423 {

424 "type": "object",

425 "properties": {

426 "kind": { "const": "sms" },

427 "phone": { "type": "string" }

428 },

429 "required": ["kind", "phone"]

430 }

431 ]

432}

433```

411 434 

412> [!WARNING]435> [!WARNING]

413>436>

414> A tool whose `parameters` root is not an object (for example, a scalar or array) cannot be compiled into a tool-call grammar and is rejected with a `400` error that names the tool.437> A tool whose `parameters` root is neither an object nor a union of objects (for example, a scalar, an array, or an `anyOf`/`oneOf` with a non-object branch) cannot be compiled into a tool-call grammar and is rejected with a `400` error that names the tool.

415 438 

416## Complete Vercel AI SDK Example439## Complete Vercel AI SDK Example

417 440