SpyBara
Go Premium

Documentation 2026-06-14 22:02 UTC to 2026-06-15 23:02 UTC

4 files changed +204 −5. View all changes and history on the product overview
2026
Wed 24 22:02 Mon 22 20:59 Fri 19 05:59 Thu 18 00:57 Wed 17 15:58 Mon 15 23:02 Sun 14 22:02 Thu 11 10:57
Details

2 2 

3# Batch API3# Batch API

4 4 

5The Batch API lets you process large volumes of requests asynchronously with reduced pricing and higher rate limits. For pricing details, see [Batch API Pricing](/developers/pricing#batch-api-pricing).5The Batch API lets you process large volumes of requests asynchronously with reduced pricing and higher rate limits. For pricing details, see [Batch API Pricing](/developers/pricing#batch-api-pricing). If you need lower latency on real-time requests instead, see [Priority Processing](/developers/advanced-api-usage/priority-processing).

6 6 

7## What is the Batch API?7## What is the Batch API?

8 8 

Details

1#### Advanced API Usage

2 

3# Priority Processing

4 

5Priority Processing gives your xAI API requests higher scheduling priority, which typically results in lower time-to-first-token (TTFT) and faster inter-token latency (ITL), especially during periods of high demand. Add `service_tier: "priority"` to any request body to opt in—no capacity reservations or advance provisioning required. The parameter is supported on text inference endpoints: Chat Completions and Responses.

6 

7When priority capacity is available, requests are scheduled ahead of standard traffic. The response always includes a `service_tier` field indicating whether priority was granted; check it to confirm.

8 

9## How it works

10 

11Add the `service_tier` field to any supported request. The API returns the tier that was actually used in the response, so you can confirm the upgrade took effect.

12 

13The `service_tier` field accepts the following values:

14 

15| Value | Meaning |

16|-------|---------|

17| `"default"` | Standard processing. This is the same as omitting the field entirely. |

18| `"priority"` | Request higher scheduling priority at a premium token price. |

19 

20Priority requests are billed at a premium per-token rate. Cache discounts still apply to cached input tokens before the multiplier. For current per-model rates and the exact priority premium, see the [Pricing](/developers/pricing) page.

21 

22## Quick start

23 

24Pass `service_tier: "priority"` in your request body. The response includes a `service_tier` field confirming which tier was used.

25 

26```bash customLanguage="bash"

27curl https://api.x.ai/v1/responses \

28 -H "Authorization: Bearer $XAI_API_KEY" \

29 -H "Content-Type: application/json" \

30 -d '{

31 "model": "grok-4.3",

32 "input": "Explain the Riemann hypothesis in one paragraph.",

33 "service_tier": "priority"

34 }'

35```

36 

37```python customLanguage="pythonXAI"

38import os

39 

40from xai_sdk import Client

41from xai_sdk.chat import user

42 

43client = Client(api_key=os.getenv("XAI_API_KEY"))

44 

45chat = client.chat.create(

46 model="grok-4.3",

47 service_tier="priority",

48)

49chat.append(user("Explain the Riemann hypothesis in one paragraph."))

50 

51response = chat.sample()

52 

53print(response.content)

54print(f"Tier used: {response.service_tier}")

55```

56 

57```python customLanguage="pythonOpenAISDK"

58import os

59from openai import OpenAI

60 

61client = OpenAI(

62 api_key=os.getenv("XAI_API_KEY"),

63 base_url="https://api.x.ai/v1",

64)

65 

66response = client.responses.create(

67 model="grok-4.3",

68 input="Explain the Riemann hypothesis in one paragraph.",

69 service_tier="priority",

70)

71 

72print(response.output_text)

73print(f"Tier used: {response.service_tier}")

74```

75 

76```javascript customLanguage="javascriptOpenAISDK"

77import OpenAI from "openai";

78 

79const client = new OpenAI({

80 apiKey: process.env.XAI_API_KEY,

81 baseURL: "https://api.x.ai/v1",

82});

83 

84const response = await client.responses.create({

85 model: "grok-4.3",

86 input: "Explain the Riemann hypothesis in one paragraph.",

87 service_tier: "priority",

88});

89 

90console.log(response.output_text);

91console.log(`Tier used: ${response.service_tier}`);

92```

93 

94The response includes `"service_tier": "priority"` when the request was served at the priority tier, or `"service_tier": "default"` if it was served at the default tier instead. You are only billed at the priority rate when the response confirms `"priority"`.

95 

96```json customLanguage="json"

97{

98 "id": "resp_abc123",

99 "model": "grok-4.3",

100 "service_tier": "priority",

101 "usage": {

102 "input_tokens": 42,

103 "output_tokens": 156,

104 "cost_in_usd_ticks": 37756000

105 }

106}

107```

108 

109## Best practices

110 

111* **Latency-sensitive paths first** — Priority Processing is most valuable for user-facing requests where response time directly affects experience. Background jobs, evaluations, and bulk processing are better served by the [Batch API](/developers/advanced-api-usage/batch-api).

112* **Monitor the `service_tier` field** — Log the returned tier to track how often your requests are served at priority versus default and to correlate with your latency metrics.

113* **Combine with prompt caching** — Cached input tokens are discounted before the priority multiplier is applied, so [prompt caching](/developers/advanced-api-usage/prompt-caching) and priority processing complement each other well.

Details

1221 1221 

1222This section outlines key recommendations for building low-latency, reliable, and natural-feeling voice experiences using the xAI Voice Agent API.1222This section outlines key recommendations for building low-latency, reliable, and natural-feeling voice experiences using the xAI Voice Agent API.

1223 1223 

1224### Minimize Perceived Latency Parallel Initialization1224### Minimize perceived latency with parallel initialization

1225 1225 

1226**Start the WebSocket connection and microphone input streaming in parallel.**1226Start the WebSocket connection and microphone input streaming in parallel.

1227 1227 

1228* Initiate the WebSocket connection (including authentication via ephemeral token or API key) **as early as possible** — ideally when the voice interface loads or the user opens the mic-enabled screen.1228* Initiate the WebSocket connection (including authentication via ephemeral token or API key) **as early as possible** — ideally when the voice interface loads or the user opens the mic-enabled screen.

1229* Simultaneously begin capturing microphone audio (using `getUserMedia` in browsers or equivalent APIs on mobile/native platforms).1229* Simultaneously begin capturing microphone audio (using `getUserMedia` in browsers or equivalent APIs on mobile/native platforms).


1338 1338 

1339* **Domain Expertise** — Precise transcription of medical, legal, financial, and technical terminology — names, codes, and addresses.1339* **Domain Expertise** — Precise transcription of medical, legal, financial, and technical terminology — names, codes, and addresses.

1340 1340 

1341## Telephony Providers

1342 

1343Use Direct SIP to route calls from your carrier or PBX into a voice agent. Configure your provider to send calls to the xAI SIP host.

1344 

1345| Value | Use |

1346|-------|-----|

1347| SIP host | `sip.voice.x.ai` |

1348| SIP URI | `sip:{number}@sip.voice.x.ai;transport=tls` |

1349 

1350Replace `{number}` with your Direct SIP phone number. If you restrict inbound calls by source IP, add your provider's signaling CIDR ranges to the allowlist before testing.

1351 

1352#### Twilio

1353 

13541. In the Twilio Console, create a TwiML Bin and paste the following:

1355 

1356```text

1357<Response>

1358 <Dial answerOnBridge="true">

1359 <Sip>sip:{number}@sip.voice.x.ai;transport=tls</Sip>

1360 </Dial>

1361</Response>

1362```

1363 

13642. Open your number's Voice configuration and set the handler to this TwiML Bin.

13653. Place a test call to confirm the agent answers.

1366 

1367#### Telnyx

1368 

13691. In the Telnyx Portal, create a SIP Connection (FQDN) and an Outbound Voice Profile.

13702. Set the connection's outbound destination to `sip.voice.x.ai`.

13713. Place a test call to confirm the agent answers.

1372 

1373#### Microsoft Teams

1374 

13751. Stand up a Microsoft-certified SBC (Ribbon, AudioCodes, or Cisco CUBE) with an outbound SIP trunk to `sip.voice.x.ai`.

13762. In the Teams admin center, go to **Voice** → **Direct Routing** → **Add** and register the SBC's public FQDN as a PSTN gateway.

13773. From the Teams PowerShell module, create a voice route and routing policy (`New-CsOnlineVoiceRoute` / `New-CsOnlineVoiceRoutingPolicy`) and grant the policy to your users.

1378 

1379#### Cisco Webex Calling

1380 

13811. Stand up a Local Gateway (Cisco CUBE or equivalent SBC) with an outbound dial-peer to `sip.voice.x.ai`.

13822. In Webex Control Hub, go to **Calling** → **Locations** → **Add trunk**, choose **Premises-based**, and point it at the Local Gateway's FQDN.

13833. Under **Calling** → **Dial Plans**, route the relevant numbers or prefixes to the new trunk.

1384 

1385#### Genesys Cloud

1386 

13871. In Genesys Cloud Admin, go to **Telephony** → **Trunks** → **Create New** and pick an External Trunk of type SIP.

13882. Under **SIP Servers or Proxies**, add `sip.voice.x.ai`.

13893. In **Routing** → **Architect Flows**, add a Transfer to External action to the agent's SIP URI and assign a DID to the flow.

1390 

1391#### NICE CXone

1392 

13931. In CXone Admin, go to **Voice** → **SIP Connectivity** → **Create New** and pick External SIP.

13942. Set the trunk's Destination to `sip.voice.x.ai`.

13953. In Studio, add a SIP Transfer action targeting the agent's SIP URI and assign a DID to the script.

1396 

1397#### Amazon Chime SDK

1398 

13991. In the AWS console, open **Amazon Chime SDK** → **Voice Connectors** → **Create Voice Connector** and enable Encryption.

14002. On the connector's **Termination** tab, add `sip.voice.x.ai` and allowlist your origination CIDR ranges.

14013. Assign your DIDs, then add a SIP rule that bridges inbound calls out to the agent's SIP URI.

1402 

1403#### Amazon Connect

1404 

14051. Create an Amazon Chime SDK Voice Connector with Encryption enabled.

14062. On the Voice Connector's **Termination** tab, add `sip.voice.x.ai`.

14073. In your Connect contact flow, add a Lambda block that dials out through the Voice Connector to the agent's SIP URI, passing contact attributes as SIP headers.

1408 

1409#### RingCentral

1410 

14111. Confirm BYOC is enabled on your plan (Ultimate or Premium).

14122. In the admin portal, go to **Phone System** → **Phone Numbers** → **Carriers** → **Add Carrier** and set the endpoint to `sip.voice.x.ai`.

14133. Under **Phone Numbers**, route the DIDs you want through this carrier.

1414 

1415#### Zoom Phone

1416 

14171. Make sure BYOC is enabled on your Zoom account.

14182. In the Zoom admin, go to **Phone System Management** → **Carrier Configuration** → **Add Carrier Trunk** and set the trunk address to `sip.voice.x.ai`.

14193. Under the BYOC settings, assign the DIDs you want routed through this trunk.

1420 

1421#### Generic SIP / Other

1422 

14231. In your carrier or PBX, create an outbound route or SIP trunk.

14242. Point its destination at `sip.voice.x.ai`.

14253. Place a test call to confirm the agent answers.

1426 

1341## Migrating from OpenAI Realtime1427## Migrating from OpenAI Realtime

1342 1428 

1343If you have an existing application built on the [OpenAI Realtime API](https://developers.openai.com/api/docs/guides/realtime-conversations), switching to the Grok Voice Agent API requires only a few changes: update the base URL, swap your API key, and choose a Grok voice model.1429If you have an existing application built on the [OpenAI Realtime API](https://developers.openai.com/api/docs/guides/realtime-conversations), switching to the Grok Voice Agent API requires only a few changes: update the base URL, swap your API key, and choose a Grok voice model.

rate-limits.md +2 −2

Details

38| grok-4.20-0309-non-reasoning | T0: 1.8K, T1: 2.4K, T2: 3.6K, T3: 6K, T4: 10K | T0: 10M, T1: 15M, T2: 25M, T3: 45M, T4: 85M |38| grok-4.20-0309-non-reasoning | T0: 1.8K, T1: 2.4K, T2: 3.6K, T3: 6K, T4: 10K | T0: 10M, T1: 15M, T2: 25M, T3: 45M, T4: 85M |

39| grok-build-0.1 | T0: 1.8K, T1: 2.4K, T2: 3.6K, T3: 6K, T4: 10K | T0: 10M, T1: 15M, T2: 25M, T3: 45M, T4: 85M |39| grok-build-0.1 | T0: 1.8K, T1: 2.4K, T2: 3.6K, T3: 6K, T4: 10K | T0: 10M, T1: 15M, T2: 25M, T3: 45M, T4: 85M |

40| grok-4.20-multi-agent-0309 | T0: 450, T1: 600, T2: 900, T3: 1.5K, T4: 2.7K | T0: 2.5M, T1: 3.7M, T2: 6.2M, T3: 11M, T4: 21M |40| grok-4.20-multi-agent-0309 | T0: 450, T1: 600, T2: 900, T3: 1.5K, T4: 2.7K | T0: 2.5M, T1: 3.7M, T2: 6.2M, T3: 11M, T4: 21M |

41| grok-imagine-image | 300 | 0 |

42| grok-imagine-image-quality | 300 | 0 |41| grok-imagine-image-quality | 300 | 0 |

43| grok-imagine-video-1.5-preview | 60 | 0 |42| grok-imagine-image | 300 | 0 |

44| grok-imagine-video | 70 | 0 |43| grok-imagine-video | 70 | 0 |

44| grok-imagine-video-1.5-preview | 60 | 0 |

45 45 

46### What counts toward TPM46### What counts toward TPM

47 47