SpyBara
Go Premium

Documentation 2026-06-24 22:02 UTC to 2026-06-27 00:02 UTC

4 files changed +336 −115. View all changes and history on the product overview
2026
Sat 27 00:02 Wed 24 22:02 Mon 22 20:59 Fri 19 05:59 Thu 18 00:57 Wed 17 15:58 Mon 15 23:02 Sun 14 22:02 Thu 11 10:57
Details

1226* **Opt-in both ways.** No history replays unless the resuming session also sends `resumption.enabled: true`.1226* **Opt-in both ways.** No history replays unless the resuming session also sends `resumption.enabled: true`.

1227* **Expiry.** History is dropped after 30 minutes of inactivity.1227* **Expiry.** History is dropped after 30 minutes of inactivity.

1228 1228 

1229## DTMF (SIP Phone Keypresses)

1230 

1231When using the Voice Agent API over SIP, phone keypresses (DTMF tones) are automatically buffered and flushed to the model as text input. The client receives `input_audio_buffer.dtmf_event_received` events as an audit trail of each keypress.

1232 

1233### Flush Triggers

1234 

1235Buffered digits are submitted to the model when any of the following occurs:

1236 

1237* The user presses `#` (submit key)

1238* 2.5 seconds of idle time after the last keypress

1239* The user begins speaking (preempts the digit buffer)

1240 

1241### Audit Event

1242 

1243Each keypress is reported to the client WebSocket:

1244 

1245```json customLanguage="json"

1246{

1247 "type": "input_audio_buffer.dtmf_event_received",

1248 "event": "5",

1249 "received_at": 1730000000

1250}

1251```

1252 

1253> [!NOTE]

1254>

1255> DTMF is only available on SIP sessions — it is not emitted on direct WebSocket connections.

1256 

1257## Best Practices1229## Best Practices

1258 1230 

1259This section outlines key recommendations for building low-latency, reliable, and natural-feeling voice experiences using the xAI Voice Agent API.1231This section outlines key recommendations for building low-latency, reliable, and natural-feeling voice experiences using the xAI Voice Agent API.


1375 1347 

1376* **Domain Expertise** — Precise transcription of medical, legal, financial, and technical terminology — names, codes, and addresses.1348* **Domain Expertise** — Precise transcription of medical, legal, financial, and technical terminology — names, codes, and addresses.

1377 1349 

1378## Telephony Providers1350## SIP phone calls

1379 

1380Use Direct SIP to route calls from your carrier or PBX into a voice agent. Configure your provider to send calls to the xAI SIP host.

1381 

1382| Value | Use |

1383|-------|-----|

1384| SIP host | `sip.voice.x.ai` |

1385| SIP URI | `sip:{number}@sip.voice.x.ai;transport=tls` |

1386 

1387Replace `{number}` with your Direct SIP phone number. If you restrict inbound calls by source IP, add your provider's signaling CIDR ranges to the allowlist before testing.

1388 

1389#### Twilio

1390 

13911. In the Twilio Console, create a TwiML Bin and paste the following:

1392 

1393```text

1394<Response>

1395 <Dial answerOnBridge="true">

1396 <Sip>sip:{number}@sip.voice.x.ai;transport=tls</Sip>

1397 </Dial>

1398</Response>

1399```

1400 

14012. Open your number's Voice configuration and set the handler to this TwiML Bin.

14023. Place a test call to confirm the agent answers.

1403 

1404#### Telnyx

1405 

14061. In the Telnyx Portal, create a SIP Connection (FQDN) and an Outbound Voice Profile.

14072. Set the connection's outbound destination to `sip.voice.x.ai`.

14083. Place a test call to confirm the agent answers.

1409 

1410#### Microsoft Teams

1411 

14121. Stand up a Microsoft-certified SBC (Ribbon, AudioCodes, or Cisco CUBE) with an outbound SIP trunk to `sip.voice.x.ai`.

14132. In the Teams admin center, go to **Voice** → **Direct Routing** → **Add** and register the SBC's public FQDN as a PSTN gateway.

14143. From the Teams PowerShell module, create a voice route and routing policy (`New-CsOnlineVoiceRoute` / `New-CsOnlineVoiceRoutingPolicy`) and grant the policy to your users.

1415 

1416#### Cisco Webex Calling

1417 

14181. Stand up a Local Gateway (Cisco CUBE or equivalent SBC) with an outbound dial-peer to `sip.voice.x.ai`.

14192. In Webex Control Hub, go to **Calling** → **Locations** → **Add trunk**, choose **Premises-based**, and point it at the Local Gateway's FQDN.

14203. Under **Calling** → **Dial Plans**, route the relevant numbers or prefixes to the new trunk.

1421 

1422#### Genesys Cloud

1423 

14241. In Genesys Cloud Admin, go to **Telephony** → **Trunks** → **Create New** and pick an External Trunk of type SIP.

14252. Under **SIP Servers or Proxies**, add `sip.voice.x.ai`.

14263. In **Routing** → **Architect Flows**, add a Transfer to External action to the agent's SIP URI and assign a DID to the flow.

1427 

1428#### NICE CXone

1429 

14301. In CXone Admin, go to **Voice** → **SIP Connectivity** → **Create New** and pick External SIP.

14312. Set the trunk's Destination to `sip.voice.x.ai`.

14323. In Studio, add a SIP Transfer action targeting the agent's SIP URI and assign a DID to the script.

1433 

1434#### Amazon Chime SDK

1435 

14361. In the AWS console, open **Amazon Chime SDK** → **Voice Connectors** → **Create Voice Connector** and enable Encryption.

14372. On the connector's **Termination** tab, add `sip.voice.x.ai` and allowlist your origination CIDR ranges.

14383. Assign your DIDs, then add a SIP rule that bridges inbound calls out to the agent's SIP URI.

1439 

1440#### Amazon Connect

1441 

14421. Create an Amazon Chime SDK Voice Connector with Encryption enabled.

14432. On the Voice Connector's **Termination** tab, add `sip.voice.x.ai`.

14443. In your Connect contact flow, add a Lambda block that dials out through the Voice Connector to the agent's SIP URI, passing contact attributes as SIP headers.

1445 

1446#### RingCentral

1447 

14481. Confirm BYOC is enabled on your plan (Ultimate or Premium).

14492. In the admin portal, go to **Phone System** → **Phone Numbers** → **Carriers** → **Add Carrier** and set the endpoint to `sip.voice.x.ai`.

14503. Under **Phone Numbers**, route the DIDs you want through this carrier.

1451 

1452#### Zoom Phone

1453 

14541. Make sure BYOC is enabled on your Zoom account.

14552. In the Zoom admin, go to **Phone System Management** → **Carrier Configuration** → **Add Carrier Trunk** and set the trunk address to `sip.voice.x.ai`.

14563. Under the BYOC settings, assign the DIDs you want routed through this trunk.

1457 

1458#### Generic SIP / Other

1459 1351 

14601. In your carrier or PBX, create an outbound route or SIP trunk.1352Route PSTN, contact-center, or PBX calls into a Voice Agent API session. See [SIP Phone Calls](/developers/model-capabilities/audio/voice-agent/sip) for Agent Builder setup, API integration with `CreatePhoneNumberV2`, call control, DTMF, and telephony provider examples.

14612. Point its destination at `sip.voice.x.ai`.

14623. Place a test call to confirm the agent answers.

1463 1353 

1464## Migrating from OpenAI Realtime1354## Migrating from OpenAI Realtime

1465 1355 

Details

1#### Voice Agent API

2 

3# SIP Phone Calls

4 

5SIP lets you route PSTN, contact-center, or PBX calls into a Voice Agent API session.

6 

7### 1. Register the phone number

8 

9Create a Direct SIP phone number and include the webhook details that should receive incoming-call events. Use `origin: "byo_trunk"` for a customer-owned number. xAI creates the webhook endpoint alongside the phone-number route and returns the webhook signing secret once in the response.

10 

11Choose one SIP authentication method.

12 

13The response includes a signing secret after you register the phone number. Store it securely; xAI returns it only once.

14 

15Configure your carrier or PBX to route calls to:

16 

17 

18 

19If you provide `allowed_addresses`, make sure the list contains your provider's SIP signaling CIDR ranges. If you provide SIP digest credentials, configure your carrier with the same username and password; xAI never returns the password after creation.

20 

21### 2. Handle the incoming-call webhook

22 

23When a caller dials the number, xAI sends a signed `realtime.call.incoming` webhook to the webhook URL. Verify the `webhook-id`, `webhook-timestamp`, and `webhook-signature` headers using the signing secret returned after you register the phone number, then read `data.call_id` from the payload.

24 

25The webhook has this shape:

26 

27```json

28{

29 "object": "event",

30 "id": "evt_123",

31 "type": "realtime.call.incoming",

32 "created_at": 1750000000,

33 "data": {

34 "call_id": "00000000-0000-0000-0000-000000000000",

35 "sip_headers": [

36 { "name": "From", "value": "+14155550100" },

37 { "name": "To", "value": "+18005550199" }

38 ],

39 "metadata": {}

40 }

41}

42```

43 

44### 3. Join the call over WebSocket

45 

46Open `wss://api.x.ai/v1/realtime?call_id={call_id}` with your xAI API key. Then send `session.update` to configure the voice agent for this call, followed by `response.create` when the agent should begin speaking.

47 

48After connecting, the WebSocket behaves like any other Voice Agent API session. The SIP caller's audio is bridged into the session, and assistant audio is played back to the caller.

49 

50```python customLanguage="pythonWithoutSDK"

51import asyncio

52import json

53import os

54import websockets

55 

56async def handle_sip_call(call_id: str):

57 async with websockets.connect(

58 f"wss://api.x.ai/v1/realtime?call_id={call_id}",

59 additional_headers={"Authorization": f"Bearer {os.environ['XAI_API_KEY']}"},

60 ) as ws:

61 await ws.send(json.dumps({

62 "type": "session.update",

63 "session": {

64 "voice": "eve",

65 "instructions": "You are a helpful phone support agent.",

66 "turn_detection": {"type": "server_vad"},

67 },

68 }))

69 await ws.send(json.dumps({"type": "response.create"}))

70 

71 async for msg in ws:

72 event = json.loads(msg)

73 print(event["type"])

74 

75asyncio.run(handle_sip_call("00000000-0000-0000-0000-000000000000"))

76```

77 

78```javascript customLanguage="javascriptWithoutSDK"

79import WebSocket from "ws";

80 

81const callId = "00000000-0000-0000-0000-000000000000";

82const ws = new WebSocket(`wss://api.x.ai/v1/realtime?call_id=${callId}`, {

83 headers: { Authorization: `Bearer ${process.env.XAI_API_KEY}` },

84});

85 

86ws.on("open", () => {

87 ws.send(JSON.stringify({

88 type: "session.update",

89 session: {

90 voice: "eve",

91 instructions: "You are a helpful phone support agent.",

92 turn_detection: { type: "server_vad" },

93 },

94 }));

95 ws.send(JSON.stringify({ type: "response.create" }));

96});

97 

98ws.on("message", data => {

99 const event = JSON.parse(data.toString());

100 console.log(event.type);

101});

102```

103 

104## Call control

105 

106Use `refer` to transfer the caller to another PSTN or SIP destination:

107 

108```bash customLanguage="bash"

109curl -X POST "https://api.x.ai/v1/realtime/calls/$CALL_ID/refer" \

110 -H "Authorization: Bearer $XAI_API_KEY" \

111 -H "Content-Type: application/json" \

112 -d '{"target_uri": "sip:agent@example.com"}'

113```

114 

115Use `hangup` when your application should end the call:

116 

117```bash customLanguage="bash"

118curl -X POST "https://api.x.ai/v1/realtime/calls/$CALL_ID/hangup" \

119 -H "Authorization: Bearer $XAI_API_KEY"

120```

121 

122## DTMF phone keypresses

123 

124When using the Voice Agent API over SIP, phone keypresses (DTMF tones) are automatically buffered and flushed to the model as text input. The client receives `input_audio_buffer.dtmf_event_received` events as an audit trail of each keypress.

125 

126### Flush triggers

127 

128Buffered digits are submitted to the model when any of the following occurs:

129 

130* The user presses `#` (submit key)

131* 2.5 seconds of idle time after the last keypress

132* The user begins speaking (preempts the digit buffer)

133 

134### Audit event

135 

136Each keypress is reported to the client WebSocket:

137 

138```json customLanguage="json"

139{

140 "type": "input_audio_buffer.dtmf_event_received",

141 "event": "5",

142 "received_at": 1730000000

143}

144```

145 

146> [!NOTE]

147>

148> DTMF is only available on SIP sessions — it is not emitted on direct WebSocket connections.

149 

150## Telephony providers

151 

152In every provider, the destination is the xAI SIP URI for your registered number:

153 

154 

155 

156Replace `{number}` with your Direct SIP phone number. If you configured `allowed_addresses` when registering the number, include your provider's SIP signaling CIDR ranges.

157 

158### Twilio

159 

1601. In the Twilio Console, go to **Voice** → **Elastic SIP Trunking** and create a trunk.

1612. Open the trunk's **Origination** settings and add this origination URI: `sip:{number}@sip.voice.x.ai;transport=tls`.

1623. Assign a Twilio phone number to the trunk, or purchase a new number and attach it.

1634. If your application transfers calls mid-session, enable call transfer on the trunk.

164 

165### Telnyx

166 

1671. In the Telnyx Portal, go to **Voice Suite** → **SIP Trunking** and create an FQDN SIP Connection.

1682. In **Authentication and Routing**, add `sip.voice.x.ai` as the primary FQDN on port `5060` with record type `A`.

1693. In **Inbound settings**, set the destination number format to **E.164**.

1704. Enable at least one supported codec: G.711 μ-law, G.711 A-law, or G.722.

1715. Assign a phone number to the SIP Connection.

172 

173### Plivo

174 

1751. In the Plivo Console, go to **SIP Trunking** and create a SIP trunk.

1762. Choose **Inbound**, then create a new URI with FQDN `sip.voice.x.ai`.

1773. Link an existing phone number to the trunk, or buy a new number and attach it.

178 

179### Bring Your Own SIP Provider

180 

1811. In your carrier, contact center, or PBX, create an outbound route or SIP trunk.

1822. Set the destination to `sip:{number}@sip.voice.x.ai;transport=tls`.

rate-limits.md +1 −1

Details

40| grok-4.20-multi-agent-0309 | T0: 7, T1: 10, T2: 15, T3: 25, T4: 45 | T0: 2.5M, T1: 3.7M, T2: 6.2M, T3: 11M, T4: 21M |40| grok-4.20-multi-agent-0309 | T0: 7, T1: 10, T2: 15, T3: 25, T4: 45 | T0: 2.5M, T1: 3.7M, T2: 6.2M, T3: 11M, T4: 21M |

41| grok-imagine-image-quality | 5 | — |41| grok-imagine-image-quality | 5 | — |

42| grok-imagine-image | 5 | — |42| grok-imagine-image | 5 | — |

43| grok-imagine-video-1.5 | 1 | — |

44| grok-imagine-video | 1 | — |43| grok-imagine-video | 1 | — |

44| grok-imagine-video-1.5 | 1 | — |

45 45 

46### What counts toward TPM46### What counts toward TPM

47 47 

Details

90 90 

91***91***

92 92 

93## POST /v2/phone-numbers

94 

95Create a phone number for API-controlled SIP calls.

96 

97### Request Body

98 

99* `origin` ("xai\_provisioned" | "byo\_trunk", required) — Use \`byo\_trunk\` for customer-owned Direct SIP numbers.

100 

101* `name` (string, required)

102 

103* `agent_id` (string) — Route calls to this agent. Mutually exclusive with \`webhook\`.

104 

105* `area_code` (string) — xAI-provisioned only: optional 3-digit US area code filter.

106 

107* `phone_number` (string) — BYO trunk only: customer-owned phone number in E.164 format.

108 

109* `sip_auth` (object)

110 

111 * `auth_username` (string) — SIP digest username. Must be provided together with \`auth\_password\`.

112 

113 * `auth_password` (string) — SIP digest password. Stored encrypted and never returned by read endpoints.

114 

115 * `allowed_addresses` (array\<string>) — Source CIDR ranges permitted to send INVITEs.

116 

117* `webhook` (object)

118 

119 * `name` (string) — Optional display name. Defaults to the phone number's name when omitted.

120 

121 * `url` (string, required) — URL xAI POSTs signed \`realtime.call.incoming\` events to.

122 

123 * `auth_url` (string) — Optional OAuth token-exchange URL paired with \`auth\_token\`.

124 

125 * `auth_token` (string) — Optional bearer credential, or OAuth client credential when paired with \`auth\_url\`.

126 

127### Response Body

128 

129* `phone_number` (object)

130 

131 * `phone_number_id` (string)

132 

133 * `team_id` (string)

134 

135 * `phone_number` (string) — Phone number in E.164 format.

136 

137 * `name` (string)

138 

139 * `agent_id` (string) — Agent this number routes to.

140 

141 * `webhook_id` (string) — Webhook endpoint this number dispatches \`realtime.call.incoming\` events to.

142 

143 * `origin` ("xai\_provisioned" | "byo\_trunk")

144 

145 * `sip_host` (string) — SIP host your carrier or PBX should route calls to.

146 

147 * `inbound_trunk_id` (string) — Read-only SIP trunk identifier.

148 

149 * `sip_auth` (object)

150 

151 * `auth_username` (string) — SIP digest username. Present only when digest auth is configured.

152 

153 * `allowed_addresses` (array\<string>) — Source CIDR ranges permitted to send INVITEs.

154 

155 * `created_at` (string)

156 

157 * `updated_at` (string)

158 

159 * `agent_name` (string)

160 

161* `webhook` (object)

162 

163 * `webhook_id` (string)

164 

165 * `dispatch_signing_secret` (string) — Standard Webhooks v1 HMAC-SHA256 signing secret. Returned only once and cannot be recovered.

166 

167\*\*Response example:\*\*

168 

169```json

170{

171 "phone_number": {

172 "phone_number_id": "phone_abc123",

173 "team_id": "00000000-0000-0000-0000-000000000000",

174 "phone_number": "+18005550199",

175 "name": "Support SIP trunk",

176 "webhook_id": "webhook_abc123",

177 "origin": "byo_trunk",

178 "sip_host": "sip.voice.x.ai",

179 "sip_auth": {

180 "allowed_addresses": [

181 "203.0.113.0/24"

182 ]

183 },

184 "created_at": "2026-06-19T00:00:00Z",

185 "updated_at": "2026-06-19T00:00:00Z"

186 },

187 "webhook": {

188 "webhook_id": "webhook_abc123",

189 "dispatch_signing_secret": "whsec_..."

190 }

191}

192```

193 

194***

195 

93## Realtime196## Realtime

94 197 

95WebSocket endpoint: `wss://api.x.ai/v1/realtime`198WebSocket endpoint: `wss://api.x.ai/v1/realtime`

96 199 

97Real-time voice conversations with Grok models via WebSocket. The connection begins with an HTTP GET that is upgraded to WebSocket (status 101). Once connected, the client and server exchange JSON messages to configure the session, stream audio, and receive responses.200Real-time voice conversations with Grok models via WebSocket. The connection begins with an HTTP GET that is upgraded to WebSocket (status 101). Once connected, the client and server exchange JSON messages to configure the session, stream audio, and receive responses. For SIP calls, connect with the \`call\_id\` from a \`realtime.call.incoming\` webhook.

98 201 

99Full schemas and examples: [`/voice-realtime.ws.json`](/voice-realtime.ws.json)202Full schemas and examples: [`/voice-realtime.ws.json`](/voice-realtime.ws.json)

100 203 

101### Query Parameters204### Query Parameters

102 205 

103* `model` (string, optional, default: grok-voice-latest) — Model to use for the session. Use grok-voice-latest for the best experience.206* `call_id` (string, optional) — SIP call identifier from a \`realtime.call.incoming\` webhook. When provided, the WebSocket connects to that inbound SIP call. Authenticate with an xAI API key; ephemeral client secrets are not supported for SIP \`call\_id\` sessions.

207 

208* `model` (string, optional, default: grok-voice-latest) — Model to use for the session. Ignored when \`call\_id\` is provided because the session is bound to the inbound SIP call. Use grok-voice-latest for the best experience on direct WebSocket sessions.

104 209 

105* `reasoning.effort` (string, optional, default: high) — Controls whether the model uses reasoning. Defaults to \`high\`. Supported only with grok-voice-latest and grok-voice-think-fast-1.0.210* `reasoning.effort` (string, optional, default: high) — Controls whether the model uses reasoning. Defaults to \`high\`. Supported only with grok-voice-latest and grok-voice-think-fast-1.0.

106 211 


238 343 

239***344***

240 345 

346## POST /v1/realtime/calls/\{call\_id}/refer

347 

348Transfer an active SIP call to a PSTN or SIP destination.

349 

350### Path Parameters

351 

352* `call_id` (string, required) — SIP call identifier from the \`realtime.call.incoming\` webhook.

353 

354### Request Body

355 

356* `target_uri` (string, required) — Destination for the SIP REFER. Use \`tel:+E.164\` for PSTN destinations or \`sip:user@host\` for direct SIP routing.

357 

358\*\*Request example:\*\*

359 

360```json

361{

362 "target_uri": "sip:agent@example.com"

363}

364```

365 

366\*\*Response example:\*\*

367 

368```json

369{}

370```

371 

372***

373 

374## POST /v1/realtime/calls/\{call\_id}/hangup

375 

376End an active SIP call.

377 

378### Path Parameters

379 

380* `call_id` (string, required) — SIP call identifier from the \`realtime.call.incoming\` webhook.

381 

382\*\*Response example:\*\*

383 

384```json

385{}

386```

387 

388***

389 

241## POST /v1/tts390## POST /v1/tts

242 391 

243Convert text into speech audio.392Convert text into speech audio.