SpyBara
Go Premium

google-vertex-ai.md 2026-04-01 21:12 UTC to 2026-04-02 21:08 UTC

3 added, 3 removed.

2026
Wed 29 21:21 Tue 28 21:21 Mon 27 21:20 Sun 26 04:08 Sat 25 21:10 Fri 24 18:11 Thu 23 18:19 Wed 22 21:15 Tue 21 21:14 Mon 20 21:14 Sat 18 18:09 Fri 17 21:13 Thu 16 21:13 Wed 15 18:20 Tue 14 21:14 Mon 13 21:14 Sat 11 00:11 Fri 10 21:09 Thu 9 21:14 Wed 8 21:13 Tue 7 21:14 Sat 4 18:05 Fri 3 21:07 Thu 2 21:08 Wed 1 21:12

Claude Code on Google Vertex AI

Learn about configuring Claude Code through Google Vertex AI, including setup, IAM configuration, and troubleshooting.

Prerequisites

Before configuring Claude Code with Vertex AI, ensure you have:

  • A Google Cloud Platform (GCP) account with billing enabled
  • A GCP project with Vertex AI API enabled
  • Access to desired Claude models (for example, Claude Sonnet 4.6)
  • Google Cloud SDK (gcloud) installed and configured
  • Quota allocated in desired GCP region

Region Configuration

Claude Code can be used with both Vertex AI global and regional endpoints.

Setup

1. Enable Vertex AI API

Enable the Vertex AI API in your GCP project:

# Set your project ID
gcloud config set project YOUR-PROJECT-ID

# Enable Vertex AI API
gcloud services enable aiplatform.googleapis.com

2. Request model access

Request access to Claude models in Vertex AI:

  1. Navigate to the Vertex AI Model Garden
  2. Search for "Claude" models
  3. Request access to desired Claude models (for example, Claude Sonnet 4.6)
  4. Wait for approval (may take 24-48 hours)

3. Configure GCP credentials

Claude Code uses standard Google Cloud authentication.

For more information, see Google Cloud authentication documentation.

4. Configure Claude Code

Set the following environment variables:

# Enable Vertex AI integration
export CLAUDE_CODE_USE_VERTEX=1
export CLOUD_ML_REGION=global
export ANTHROPIC_VERTEX_PROJECT_ID=YOUR-PROJECT-ID

# Optional: Override the Vertex endpoint URL for custom endpoints or gateways
# export ANTHROPIC_VERTEX_BASE_URL=https://aiplatform.googleapis.com

# Optional: Disable prompt caching if needed
export DISABLE_PROMPT_CACHING=1

# When CLOUD_ML_REGION=global, override region for models that don't support global endpoints
export VERTEX_REGION_CLAUDE_HAIKU_4_5=us-east5
export VERTEX_REGION_CLAUDE_4_6_SONNET=europe-west1

Most model versions have a corresponding VERTEX_REGION_CLAUDE_* variable. See the Environment variables reference for the full list. Check Vertex Model Garden to determine which models support global endpoints versus regional only.

Prompt caching is automatically supported when you specify the cache_control ephemeral flag. To disable it, set DISABLE_PROMPT_CACHING=1. For heightened rate limits, contact Google Cloud support. When using Vertex AI, the /login and /logout commands are disabled since authentication is handled through Google Cloud credentials.

5. Pin model versions

Set these environment variables to specific Vertex AI model IDs:

export ANTHROPIC_DEFAULT_OPUS_MODEL='claude-opus-4-6'
export ANTHROPIC_DEFAULT_SONNET_MODEL='claude-sonnet-4-6'
export ANTHROPIC_DEFAULT_HAIKU_MODEL='claude-haiku-4-5@20251001'

For current and legacy model IDs, see Models overview. See Model configuration for the full list of environment variables.

Claude Code uses these default models when no pinning variables are set:

Model type Default value
Primary model claude-sonnet-4-5@20250929
Small/fast model claude-haiku-4-5@20251001

To customize models further:

export ANTHROPIC_MODEL='claude-opus-4-6'
export ANTHROPIC_DEFAULT_HAIKU_MODEL='claude-haiku-4-5@20251001'

IAM configuration

Assign the required IAM permissions:

The roles/aiplatform.user role includes the required permissions:

  • aiplatform.endpoints.predict - Required for model invocation and token counting

For more restrictive permissions, create a custom role with only the permissions above.

For details, see Vertex IAM documentation.

1M token context window

Claude Opus 4.6, Sonnet 4.6, Sonnet 4.5, and Sonnet 4 support the 1M token context window on Vertex AI. Claude Code automatically enables the extended context window when you select a 1M model variant.

To enable the 1M context window for your pinned model, append [1m] to the model ID. See Pin models for third-party deployments for details.

Troubleshooting

If you encounter quota issues:

  • Check current quotas or request quota increase through Cloud Console

If you encounter "model not found" 404 errors:

  • Confirm model is Enabled in Model Garden
  • Verify you have access to the specified region
  • If using CLOUD_ML_REGION=global, check that your models support global endpoints in Model Garden under "Supported features". For models that don't support global endpoints, either:
    • Specify a supported model via ANTHROPIC_MODEL or ANTHROPIC_DEFAULT_HAIKU_MODEL, or
    • Set a regional endpoint using VERTEX_REGION_<MODEL_NAME> environment variables

If you encounter 429 errors:

  • For regional endpoints, ensure the primary model and small/fast model are supported in your selected region
  • Consider switching to CLOUD_ML_REGION=global for better availability

Additional resources