Documentation — Spybara

Files

community
- google-cloud-vertex-ai.md
- microsoft-foundry.md
model-capabilities
- audio
  - voice-agent.md
rest-api-reference
- inference
  - models.md

community/google-cloud-vertex-ai.md +184 −0 created

Details

1#### Community Integrations

3# Google Cloud Vertex AI

5Access xAI’s Grok models through Google Cloud’s managed platform with enterprise security, governance, and unified billing.

7This guide walks through setting up and using Grok models on Google Cloud Vertex AI / Gemini Enterprise Agent Platform. Grok on Vertex AI is accessed as a partner model through the OpenAI-compatible API, including the Responses API and Chat Completions. Models are enabled through Model Garden.

9## Prerequisites

11Before you begin, ensure you have:

13* An active Google Cloud Platform (GCP) project with billing enabled.

14* Permissions to enable APIs and access Model Garden, such as the Vertex AI User or Project Editor role.

15* The `aiplatform.googleapis.com` API, or equivalent Agent Platform API, enabled in your project.

16* Google Cloud CLI (`gcloud`) installed and authenticated for Application Default Credentials (ADC).

18Set up ADC and your project:

20```bash customLanguage="bash"

21gcloud auth application-default login

22gcloud config set project YOUR_PROJECT_ID

23```

25Enable the required API if it is not already enabled:

27```bash customLanguage="bash"

28gcloud services enable aiplatform.googleapis.com

29```

31## Install required packages

33```bash customLanguage="bash"

34pip install -U openai google-cloud-aiplatform

35```

37## Enable Grok models in Model Garden

391. Go to the Google Cloud Console Model Garden, or search for “Model Garden” in the console.

402. Search for “Grok”, or browse by publisher xAI.

413. Select the desired Grok model, such as Grok 4.2 or Grok 4.3.

424. Review the model card for capabilities, quotas, pricing, and regions.

435. Click **Enable** or **Deploy / request access** if prompted.

446. Once enabled, the model becomes available for API calls.

46Use the model ID shown in Model Garden. Vertex model names may use a publisher prefix, for example:

48* `xai/grok-4.3`

50Model availability generally matches the xAI API, subject to Google Cloud regional availability and quotas.

52## Make your first API call

54Grok on Vertex uses the OpenAI-compatible interface. You can use the standard `openai` Python library.

56### Authentication

58Use Application Default Credentials. The client can pick up your `gcloud` auth or service account credentials.

60You may need to set the Vertex/OpenAI-compatible base URL or endpoint with an environment variable or directly in the client. Use the exact endpoint from the model card or Google documentation for the Agent Platform.

62```bash customLanguage="bash"

63export OPENAI_BASE_URL="https://YOUR_VERTEX_ENDPOINT"

64```

66### Responses API example

68```python customLanguage="pythonOpenAISDK"

69from openai import OpenAI

71client = OpenAI() # Uses ADC / env vars automatically

73response = client.responses.create(

74 model="xai/grok-4.3",

75 input="Explain the advantages of using Grok for agentic workflows with parallel tool calling.",

76 max_output_tokens=800,

77)

79print(response.output_text)

80```

82### Chat Completions example

84```python customLanguage="pythonOpenAISDK"

85from openai import OpenAI

87client = OpenAI()

89response = client.chat.completions.create(

90 model="xai/grok-4.3",

91 messages=[

92 {

93 "role": "user",

94 "content": "Which city has a higher temperature right now, Boston or New Delhi, and by how much in Fahrenheit?",

95 }

96 ],

97 tools=[

98 {

99 "type": "function",

100 "function": {

101 "name": "get_current_weather",

102 "description": "Get the current weather in a given location",

103 "parameters": {

104 "type": "object",

105 "properties": {

106 "location": {

107 "type": "string",

108 "description": "The city and state, e.g., San Francisco, CA",

109 },

110 "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},

111 },

112 "required": ["location"],

113 },

114 },

115 }

116 ],

117 tool_choice="auto",

118)

119

120print(response.choices[0].message.content)

121```

122

123Streaming is supported on both interfaces for lower-latency experiences.

124

125## Function calling and tool use

126

127Grok excels at tool use and parallel function calling across the Responses API and Chat Completions interfaces. Define clear, strict schemas for tools so the model can select and call them reliably.

128

129## Data retention and compliance

130

131Data retention and processing for Grok models on Google Cloud are governed by Google Cloud Vertex AI policies.

132

133* Many deployments support Zero Data Retention (ZDR) options.

134* Review the specific model card and your organization’s Google Cloud data governance settings.

135* Activity logging can be enabled with Vertex AI request-response logging for audit and debugging purposes.

136

137See Google Cloud documentation on Vertex AI data governance and logging for details.

138

139## Feature support

140

141Supported capabilities include:

142

143* Responses API and Chat Completions.

144* Function calling and tool use, including parallel function calling.

145* Reasoning modes / extended thinking.

146* Structured outputs / JSON mode.

147* Streaming.

148* Fixed quotas and committed use discounts through Google Cloud.

149

150Context windows vary by model. Check the specific Grok model card in Model Garden for the current limit.

151

152## Global, multi-region, and regional endpoints

153

154Vertex AI / Gemini Enterprise Agent Platform offers flexible endpoint routing:

155

156* **Global endpoints:** maximum availability with dynamic routing; recommended for most use cases.

157* **Regional endpoints:** routing through specific regions for strict compliance requirements.

158

159## Best practices

160

161* Choose the Grok model and endpoint configuration that match your latency, throughput, and reasoning requirements.

162* Prefer Application Default Credentials and IAM roles over long-lived keys. Use service accounts for production workloads.

163* Monitor usage in Google Cloud Billing and Quotas pages. Request quota increases as needed.

164* Use clear tool schemas and explicit output formats.

165* Enable request logging and integrate with Google Cloud Monitoring / Logging.

166* When migrating from the direct xAI API, update the base URL, client configuration, and model prefix. Most prompts and tool definitions transfer with minimal changes.

167

168## Troubleshooting

169

170| Issue | What to check |

171|---|---|

172| Authentication errors | Run `gcloud auth application-default login` and verify project permissions. |

173| Model not found | Confirm the model is enabled in Model Garden and use the exact `xai/...` ID. |

174| Quota exceeded | Check quotas in the Google Cloud console and request increases as needed. |

175| Endpoint / base URL issues | Use the exact endpoint or environment variable from the model card or Google documentation. |

176

177Start in the Google Cloud console playground / Model Garden interface when available, then move to code.

178

179## Next steps

180

181* Explore enabled models in Model Garden.

182* Build agentic applications that use Grok’s tool-calling strengths.

183* Integrate with Google Cloud services such as Cloud Functions and Vertex AI Pipelines.

184* Review the full xAI Grok documentation and model cards for prompting tips and capabilities.

community/microsoft-foundry.md +300 −0 created

Details

1#### Community Integrations

3# Microsoft Foundry

5Access xAI’s frontier reasoning and agentic models through Azure AI Foundry with enterprise-grade security, governance, and unified billing.

7This guide walks through setting up and using Grok models on Microsoft Foundry. Grok models on Foundry give you strong reasoning, native tool use, enterprise authentication through Microsoft Entra ID, Azure-native monitoring, and an OpenAI-compatible API.

9Usage is billed through the Azure Marketplace / your Azure subscription. Grok models are delivered through the xAI–Microsoft partnership with Azure-managed endpoints and optional Azure AI Content Safety layers. Review the specific model card in the Foundry catalog for the latest details on data processing, retention, and terms.

11Grok on Foundry works with the official OpenAI Python/TypeScript SDKs, `azure-ai-projects`, LangChain, Semantic Kernel, LlamaIndex, and most OpenAI-compatible frameworks. Streaming, tool calling, and structured outputs are supported.

13## Prerequisites

15Before you begin, ensure you have:

17* An active Azure subscription.

18* Access to Azure AI Foundry.

19* Sufficient permissions to create or manage Foundry resources/projects and deploy models, typically Contributor or custom roles with model deployment rights.

20* Optional but recommended: the Azure CLI installed for resource management and authentication testing.

21* Python 3.10+ for the examples in this guide.

23## Install required packages

25```bash customLanguage="bash"

26pip install -U openai azure-identity

27```

29Optional, for higher-level project client patterns:

31```bash customLanguage="bash"

32pip install azure-ai-projects

33```

35## Provisioning

37Foundry organizes work into resources for security, billing, and networking, and projects for deployments and collaboration. Create a resource/project first, then deploy one or more Grok model instances inside it.

39The deployment name you choose becomes the value passed in the `model` parameter of your API requests.

41### Create or select a Foundry resource and project

431. Navigate to the Foundry portal.

442. Create a new Foundry resource, or select an existing one.

453. Within the resource, create a new project if your workflow uses projects.

464. Configure access management:

47 * Use Microsoft Entra ID with role-based access control (RBAC).

48 * Assign the Cognitive Services OpenAI User role, or equivalent, to identities that will call models.

49 * Optionally configure private networking through Azure Virtual Network.

505. Note your resource name and project name for later.

52The resulting endpoint base will be:

54```text customLanguage="text"

55https://{resource-name}.services.ai.azure.com/api/projects/{project-name}/openai/v1

56```

58### Deploy a Grok model

601. In the Foundry portal, go to your resource or project, then **Models + endpoints**.

612. Click **+ Deploy model** → **Deploy base model**, or browse the Model catalog directly and search for “Grok”.

623. Browse or search the catalog for the desired Grok model, for example, `grok-4.3`.

634. Review the model card for capabilities, context window, tool calling support, safety evaluations, pricing, and deployment options.

645. Click **Deploy**.

656. Configure deployment settings:

66 * **Deployment name:** Choose a clear, stable name, such as `grok-4.3`. This name cannot be changed after creation and is the value you use in the `model` parameter.

67 * **Deployment type / SKU:** Select Serverless for pay-as-you-go workloads, or Provisioned Throughput Units (PTU) for predictable high-volume performance.

687. Review and select **Deploy**. Wait for the deployment to reach **Ready / Running** status.

70Once deployed, you can test in the built-in Playground, view generated code snippets, manage keys/endpoints if API key auth is enabled, and monitor usage and metrics.

72## Authentication

74Grok on Foundry uses Azure-native authentication. The recommended approach is Microsoft Entra ID (keyless) with `DefaultAzureCredential`. API keys from the portal may also be supported depending on your resource configuration.

76All requests go to your Foundry project’s OpenAI-compatible endpoint:

78```text customLanguage="text"

79https://{resource-name}.services.ai.azure.com/api/projects/{project-name}/openai/v1

80```

82### Recommended: Entra ID authentication

84Use `azure.identity` and `get_bearer_token_provider`. This enables seamless RBAC, managed identities, and avoids secret management.

86```python customLanguage="pythonOpenAISDK"

87from azure.identity import DefaultAzureCredential, get_bearer_token_provider

88from openai import OpenAI

90project_endpoint = "https://YOUR-RESOURCE-NAME.services.ai.azure.com/api/projects/YOUR_PROJECT_NAME"

91base_url = project_endpoint.rstrip("/") + "/openai/v1"

93credential = DefaultAzureCredential()

94token_provider = get_bearer_token_provider(

95 credential,

96 "https://ai.azure.com/.default",

97)

99client = OpenAI(

100 base_url=base_url,

101 api_key=token_provider,

102)

103

104response = client.responses.create(

105 model="grok-4.3", # your deployment name

106 input="Explain the significance of Grok's tool-calling capabilities for building reliable agents. Be concise but insightful.",

107 max_output_tokens=800,

108)

109

110print(response.output_text)

111```

112

113Important:

114

115* Assign the Cognitive Services OpenAI User, or appropriate role, to the identity running this code.

116* `DefaultAzureCredential` handles local development through Azure CLI / VS Code, managed identities, service principals, and other supported flows.

117* No `api-version` query parameter is needed; the `/openai/v1` path handles compatibility.

118

119### Alternative: API key authentication

120

121If your Foundry resource exposes keys under Keys and Endpoint, copy a primary or secondary key and use it directly as the `api_key`:

122

123```python customLanguage="pythonOpenAISDK"

124client = OpenAI(

125 base_url=base_url,

126 api_key="your-foundry-api-key-here",

127)

128```

129

130Prefer Entra ID + RBAC in production. Never commit keys to source control, and rotate keys regularly.

131

132## Make your first API call

133

134### Simple reasoning call

135

136```python customLanguage="pythonOpenAISDK"

137response = client.responses.create(

138 model="grok-4.3",

139 input="Walk through the first-principles reasoning to determine why reusable rockets dramatically reduce the cost of space access.",

140 max_output_tokens=1500,

141)

142print(response.output_text)

143```

144

145### Tool calling example

146

147Grok excels at tool use. Here is a pattern for parallel tool calling:

148

149```python customLanguage="pythonOpenAISDK"

150tools = [

151 {

152 "type": "function",

153 "function": {

154 "name": "get_current_weather",

155 "description": "Get the current weather in a given location",

156 "parameters": {

157 "type": "object",

158 "properties": {

159 "location": {

160 "type": "string",

161 "description": "City and country, e.g., San Francisco, CA",

162 },

163 "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},

164 },

165 "required": ["location"],

166 },

167 },

168 },

169 {

170 "type": "function",

171 "function": {

172 "name": "search_web",

173 "description": "Search the web for recent information",

174 "parameters": {

175 "type": "object",

176 "properties": {

177 "query": {"type": "string"},

178 },

179 "required": ["query"],

180 },

181 },

182 },

183]

184

185response = client.responses.create(

186 model="grok-4.3",

187 input="What's the weather like in Palo Alto right now and any major tech news from today?",

188 tools=tools,

189 # parallel_tool_calls=True, # enable if supported in your deployment

190 max_output_tokens=1000,

191)

192

193print(response)

194```

195

196In a real agent loop, execute the tool calls and continue the conversation with the tool results.

197

198### Streaming response

199

200```python customLanguage="pythonOpenAISDK"

201stream = client.responses.create(

202 model="grok-4.3",

203 input="Write a short, helpful onboarding guide for a new engineer joining xAI.",

204 max_output_tokens=600,

205 stream=True,

206)

207

208for chunk in stream:

209 if hasattr(chunk, "choices") and chunk.choices:

210 delta = chunk.choices[0].delta

211 if hasattr(delta, "content") and delta.content:

212 print(delta.content, end="", flush=True)

213```

214

215Start with the Playground in the Foundry portal for rapid prompt iteration, then move to code.

216

217## Correlation IDs and debugging

218

219Foundry includes standard Azure request identifiers in response headers, such as `request-id`, `apim-request-id`, and `x-ms-request-id`. When contacting Microsoft or xAI support, include these IDs with your deployment name and approximate timestamp.

220

221## Feature support and capabilities

222

223| Capability | Notes |

224|---|---|

225| Reasoning | Strong first-principles reasoning. “Think mode” style prompting works well. |

226| Tool / function calling | Native support for reliable agentic workflows. |

227| Structured outputs / JSON mode | Supported. Ask for `response_format`, or instruct JSON explicitly. |

228| Streaming | Supported for low-latency user experiences. |

229| Long context | Check the specific model card for current context windows. |

230| Code generation | Strong performance for code generation and editing tasks. |

231

232## Safety and responsible AI

233

234Grok models include xAI’s safety training and alignment. On Foundry, Azure AI Content Safety is available and often enabled by default or easily integrated.

235

236Before production deployment:

237

238* Review the full model card and safety benchmark tab in the Foundry catalog.

239* Use clear system prompts that define safety boundaries and desired behavior.

240* Implement Azure Content Safety filters for input/output where appropriate.

241* Conduct your own red-teaming and evaluations.

242* Monitor usage and any required mitigations.

243

244## Limitations

245

246* Feature parity with the direct xAI API at `api.x.ai` may differ slightly, especially for the latest experimental features.

247* Validate vision/multimodal support and exact parameter availability for your chosen model/deployment.

248* Rate limits and quotas are managed at the Azure resource level.

249

250For the authoritative list of supported parameters and behaviors, consult the model card inside Azure AI Foundry and xAI Grok documentation linked from the catalog.

251

252## Best practices for production

253

254### Model selection

255

256* Use full Grok reasoning models for maximum reasoning depth and capability.

257* Use balanced reasoning settings for simpler tasks.

258* Choose the deployment settings that match your latency, throughput, and cost requirements.

259

260### Prompting Grok effectively

261

262* Encourage step-by-step reasoning when needed.

263* Specify the desired output format.

264* Use clear tool schemas.

265

266### Cost management

267

268* Monitor spend in Azure Cost Management + Billing.

269* Use serverless for spiky or experimental workloads; use PTU for steady high throughput.

270* Right-size the model and deployment type for your expected traffic pattern.

271

272### Security and compliance

273

274* Prefer Entra ID + RBAC over long-lived keys.

275* Use private endpoints / VNet injection where required.

276* Log requests with correlation IDs for auditability.

277

278### Observability

279

280* Integrate Azure Monitor, Application Insights, or Log Analytics.

281* Track token usage, latency, and error rates per deployment.

282

283## Troubleshooting

284

285| Issue | What to check |

286|---|---|

287| 401 Unauthorized | Missing or incorrect Entra role; wrong token scope; check the `DefaultAzureCredential` chain. |

288| 404 Not Found / model not found | Wrong deployment name; it must match exactly what you created in the portal. |

289| Deployment stuck in “Running” | Check region quotas, resource health, portal notifications, or try redeploying. |

290| Slow responses or high latency | Consider Provisioned Throughput. Check the network path to the Azure region. |

291| Tool calls not executing as expected | Verify the tool schema and whether parallel tool calling is enabled/supported for the deployment. |

292| Content filtered / blocked | Review Azure Content Safety configuration and your system prompt. Adjust safety thresholds if needed. |

293

294## Next steps

295

296* Use the Playground inside your Foundry project.

297* Combine Grok with Azure AI Agent Service or popular frameworks such as LangChain, Semantic Kernel, and CrewAI.

298* Add retrieval, memory, and orchestration layers for production systems.

299* Use Foundry tracing and your internal eval harness to evaluate and improve behavior.

300* When migrating from the direct xAI API, update authentication and endpoint configuration. Most prompts and tool schemas transfer with minimal changes.

model-capabilities/audio/voice-agent.md +1 −1

Details

1349 1349

1350## SIP phone calls1350## SIP phone calls

1351 1351

1352Route PSTN, contact-center, or PBX calls into a Voice Agent API session. See [SIP Phone Calls](/developers/model-capabilities/audio/voice-agent/sip) for Agent Builder setup, API integration with `CreatePhoneNumberV2`, call control, DTMF, and telephony provider examples.1352Route PSTN, contact-center, or PBX calls into a Voice Agent API session. See [SIP Phone Calls](/developers/model-capabilities/audio/voice-agent/sip) for API integration with `CreatePhoneNumberV2`, call control, DTMF, and telephony provider examples.

1353 1353

1354## Migrating from OpenAI Realtime1354## Migrating from OpenAI Realtime

1355 1355

rest-api-reference/inference/models.md +1 −1

Details

245 "cached_prompt_text_token_price": 2000,245 "cached_prompt_text_token_price": 2000,

246 "prompt_image_token_price": 0,246 "prompt_image_token_price": 0,

247 "completion_text_token_price": 80000,247 "completion_text_token_price": 80000,

248 "search_price": 250000000,248 "search_price": 0,

249 "prompt_text_token_price_long_context": 40000,249 "prompt_text_token_price_long_context": 40000,

250 "cached_prompt_text_token_price_long_context": 0,250 "cached_prompt_text_token_price_long_context": 0,

251 "completion_text_token_price_long_context": 160000,251 "completion_text_token_price_long_context": 160000,

Documentation 2026-06-27 00:02 UTC to 2026-06-29 23:02 UTC