agent-sdk/observability.md +216 −0 added
1> ## Documentation Index
2> Fetch the complete documentation index at: https://code.claude.com/docs/llms.txt
3> Use this file to discover all available pages before exploring further.
4
5# Observability with OpenTelemetry
6
7> Export traces, metrics, and events from the Agent SDK to your observability backend using OpenTelemetry.
8
9When you run agents in production, you need visibility into what they did:
10
11* which tools they called
12* how long each model request took
13* how many tokens were spent
14* where failures occurred
15
16The Agent SDK can export this data as OpenTelemetry traces, metrics, and log events to any backend that accepts the OpenTelemetry Protocol (OTLP), such as Honeycomb, Datadog, Grafana, Langfuse, or a self-hosted collector.
17
18This guide explains how the SDK emits telemetry, how to configure the export, and how to tag and filter the data once it reaches your backend. To read token usage and cost directly from the SDK response stream instead of exporting to a backend, see [Track cost and usage](/en/agent-sdk/cost-tracking).
19
20## How telemetry flows from the SDK
21
22The Agent SDK runs the Claude Code CLI as a child process and communicates with it over a local pipe. The CLI has OpenTelemetry instrumentation built in: it records spans around each model request and tool execution, emits metrics for token and cost counters, and emits structured log events for prompts and tool results. The SDK does not produce telemetry of its own. Instead, it passes configuration through to the CLI process, and the CLI exports directly to your collector.
23
24Configuration is passed as environment variables. By default, the child process inherits your application's environment, so you can configure telemetry in either of two places:
25
26* **Process environment:** set the variables in your shell, container, or orchestrator before your application starts. Every `query()` call picks them up automatically with no code change. This is the recommended approach for production deployments.
27* **Per-call options:** set the variables in `ClaudeAgentOptions.env` (Python) or `options.env` (TypeScript). Use this when different agents in the same process need different telemetry settings. In Python, `env` is merged on top of the inherited environment. In TypeScript, `env` replaces the inherited environment entirely, so include `...process.env` in the object you pass.
28
29The CLI exports three independent OpenTelemetry signals. Each has its own enable switch and its own exporter, so you can turn on only the ones you need.
30
31| Signal | What it contains | Enable with |
32| ---------- | --------------------------------------------------------------------------- | ------------------------------------------------------------------- |
33| Metrics | Counters for tokens, cost, sessions, lines of code, and tool decisions | `OTEL_METRICS_EXPORTER` |
34| Log events | Structured records for each prompt, API request, API error, and tool result | `OTEL_LOGS_EXPORTER` |
35| Traces | Spans for each interaction, model request, tool call, and hook (beta) | `OTEL_TRACES_EXPORTER` plus `CLAUDE_CODE_ENHANCED_TELEMETRY_BETA=1` |
36
37For the complete list of metric names, event names, and attributes, see the Claude Code [Monitoring](/en/monitoring-usage) reference. The Agent SDK emits the same data because it runs the same CLI. Span names are listed in [Read agent traces](#read-agent-traces) below.
38
39## Enable telemetry export
40
41Telemetry is off until you set `CLAUDE_CODE_ENABLE_TELEMETRY=1` and choose at least one exporter. The most common configuration sends all three signals over OTLP HTTP to a collector.
42
43The following example sets the variables in a dictionary and passes them through `options.env`. The agent runs a single task, and the CLI exports spans, metrics, and events to the collector at `collector.example.com` while the loop consumes the response stream:
44
45<CodeGroup>
46 ```python Python theme={null}
47 import asyncio
48 from claude_agent_sdk import query, ClaudeAgentOptions
49
50 OTEL_ENV = {
51 "CLAUDE_CODE_ENABLE_TELEMETRY": "1",
52 # Required for traces, which are in beta. Metrics and log events do not need this.
53 "CLAUDE_CODE_ENHANCED_TELEMETRY_BETA": "1",
54 # Choose an exporter per signal. Use otlp for the SDK; see the Note below.
55 "OTEL_TRACES_EXPORTER": "otlp",
56 "OTEL_METRICS_EXPORTER": "otlp",
57 "OTEL_LOGS_EXPORTER": "otlp",
58 # Standard OTLP transport configuration.
59 "OTEL_EXPORTER_OTLP_PROTOCOL": "http/protobuf",
60 "OTEL_EXPORTER_OTLP_ENDPOINT": "http://collector.example.com:4318",
61 "OTEL_EXPORTER_OTLP_HEADERS": "Authorization=Bearer your-token",
62 }
63
64
65 async def main():
66 options = ClaudeAgentOptions(env=OTEL_ENV)
67 async for message in query(
68 prompt="List the files in this directory", options=options
69 ):
70 print(message)
71
72
73 asyncio.run(main())
74 ```
75
76 ```typescript TypeScript theme={null}
77 import { query } from "@anthropic-ai/claude-agent-sdk";
78
79 const otelEnv = {
80 CLAUDE_CODE_ENABLE_TELEMETRY: "1",
81 // Required for traces, which are in beta. Metrics and log events do not need this.
82 CLAUDE_CODE_ENHANCED_TELEMETRY_BETA: "1",
83 // Choose an exporter per signal. Use otlp for the SDK; see the Note below.
84 OTEL_TRACES_EXPORTER: "otlp",
85 OTEL_METRICS_EXPORTER: "otlp",
86 OTEL_LOGS_EXPORTER: "otlp",
87 // Standard OTLP transport configuration.
88 OTEL_EXPORTER_OTLP_PROTOCOL: "http/protobuf",
89 OTEL_EXPORTER_OTLP_ENDPOINT: "http://collector.example.com:4318",
90 OTEL_EXPORTER_OTLP_HEADERS: "Authorization=Bearer your-token",
91 };
92
93 for await (const message of query({
94 prompt: "List the files in this directory",
95 // env replaces the inherited environment in TypeScript, so spread
96 // process.env first to keep PATH, ANTHROPIC_API_KEY, and other variables.
97 options: { env: { ...process.env, ...otelEnv } },
98 })) {
99 console.log(message);
100 }
101 ```
102</CodeGroup>
103
104Because the child process inherits your application's environment by default, you can achieve the same result by exporting these variables in a Dockerfile, Kubernetes manifest, or shell profile and omitting `options.env` entirely.
105
106<Note>
107 The `console` exporter writes telemetry to standard output, which the SDK uses
108 as its message channel. Do not set `console` as an exporter value when running
109 through the SDK. To inspect telemetry locally, point
110 `OTEL_EXPORTER_OTLP_ENDPOINT` at a local collector or an all-in-one Jaeger
111 container instead.
112</Note>
113
114### Flush telemetry from short-lived calls
115
116The CLI batches telemetry and exports on an interval. On a clean process exit it attempts to flush pending data, but the flush is bounded by a short timeout, so spans can still be dropped if the collector is slow to respond. If your process is killed before the CLI shuts down, anything still in the batch buffer is lost. Lowering the export intervals reduces both windows.
117
118By default, metrics export every 60 seconds and traces and logs export every 5 seconds. The following example shortens all three intervals so that data reaches the collector while a short task is still running:
119
120<CodeGroup>
121 ```python Python theme={null}
122 OTEL_ENV = {
123 # ... exporter configuration from the previous example ...
124 "OTEL_METRIC_EXPORT_INTERVAL": "1000",
125 "OTEL_LOGS_EXPORT_INTERVAL": "1000",
126 "OTEL_TRACES_EXPORT_INTERVAL": "1000",
127 }
128 ```
129
130 ```typescript TypeScript theme={null}
131 const otelEnv = {
132 // ... exporter configuration from the previous example ...
133 OTEL_METRIC_EXPORT_INTERVAL: "1000",
134 OTEL_LOGS_EXPORT_INTERVAL: "1000",
135 OTEL_TRACES_EXPORT_INTERVAL: "1000",
136 };
137 ```
138</CodeGroup>
139
140## Read agent traces
141
142Traces give you the most detailed view of an agent run. With `CLAUDE_CODE_ENHANCED_TELEMETRY_BETA=1` set, each step of the agent loop becomes a span you can inspect in your tracing backend:
143
144* **`claude_code.interaction`:** wraps a single turn of the agent loop, from receiving a prompt to producing a response.
145* **`claude_code.llm_request`:** wraps each call to the Claude API, with model name, latency, and token counts as attributes.
146* **`claude_code.tool`:** wraps each tool invocation, with child spans for the permission wait (`claude_code.tool.blocked_on_user`) and the execution itself (`claude_code.tool.execution`).
147* **`claude_code.hook`:** wraps each [hook](/en/agent-sdk/hooks) execution. Requires detailed beta tracing (`ENABLE_BETA_TRACING_DETAILED=1` and `BETA_TRACING_ENDPOINT`) in addition to the variables above.
148
149The `llm_request`, `tool`, and `hook` spans are children of the enclosing `claude_code.interaction` span. When the agent spawns a subagent through the Task tool, the subagent's `llm_request` and `tool` spans nest under the parent agent's `claude_code.tool` span, so the full delegation chain appears as one trace.
150
151Spans carry a `session.id` attribute by default. When you make several `query()` calls against the same [session](/en/agent-sdk/sessions), filter on `session.id` in your backend to see them as one timeline. The attribute is omitted if `OTEL_METRICS_INCLUDE_SESSION_ID` is set to a falsy value.
152
153<Note>
154 Tracing is in beta. Span names and attributes may change between releases. See
155 [Traces (beta)](/en/monitoring-usage#traces-beta) in the Monitoring reference
156 for the trace exporter configuration variables.
157</Note>
158
159## Link traces to your application
160
161The SDK automatically propagates W3C trace context into the CLI subprocess. When you call `query()` while an OpenTelemetry span is active in your application, the SDK injects `TRACEPARENT` and `TRACESTATE` into the child process environment, and the CLI reads them so its `claude_code.interaction` span becomes a child of your span. The agent run then appears inside your application's trace instead of as a disconnected root.
162
163The CLI also forwards `TRACEPARENT` to every Bash and PowerShell command it runs. If a command launched through the Bash tool emits its own OpenTelemetry spans, those spans nest under the `claude_code.tool.execution` span that wraps the command.
164
165Auto-injection is skipped when you set `TRACEPARENT` explicitly in `options.env`, so you can pin a specific parent context if needed. Interactive CLI sessions ignore inbound `TRACEPARENT` entirely; only Agent SDK and `claude -p` runs honor it. See [Traces (beta)](/en/monitoring-usage#traces-beta) in the Monitoring reference for the full span and attribute reference.
166
167## Tag telemetry from your agent
168
169By default, the CLI reports `service.name` as `claude-code`. If you run several agents, or run the SDK alongside other services that export to the same collector, override the service name and add resource attributes so you can filter by agent in your backend.
170
171The following example renames the service and attaches deployment metadata. These values are applied as OpenTelemetry resource attributes on every span, metric, and event the agent emits:
172
173<CodeGroup>
174 ```python Python theme={null}
175 options = ClaudeAgentOptions(
176 env={
177 # ... exporter configuration ...
178 "OTEL_SERVICE_NAME": "support-triage-agent",
179 "OTEL_RESOURCE_ATTRIBUTES": "service.version=1.4.0,deployment.environment=production",
180 },
181 )
182 ```
183
184 ```typescript TypeScript theme={null}
185 const options = {
186 env: {
187 ...process.env,
188 // ... exporter configuration ...
189 OTEL_SERVICE_NAME: "support-triage-agent",
190 OTEL_RESOURCE_ATTRIBUTES:
191 "service.version=1.4.0,deployment.environment=production",
192 },
193 };
194 ```
195</CodeGroup>
196
197## Control sensitive data in exports
198
199Telemetry is structural by default. Durations, model names, and tool names are recorded on every span; token counts are recorded when the underlying API request returns usage data, so spans for failed or aborted requests may omit them. The content your agent reads and writes is not recorded by default. These opt-in variables add content to the exported data:
200
201| Variable | Adds |
202| ------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
203| `OTEL_LOG_USER_PROMPTS=1` | Prompt text on `claude_code.user_prompt` events and on the `claude_code.interaction` span |
204| `OTEL_LOG_TOOL_DETAILS=1` | Tool input arguments (file paths, shell commands, search patterns) on `claude_code.tool_result` events |
205| `OTEL_LOG_TOOL_CONTENT=1` | Full tool input and output bodies as span events on `claude_code.tool`, truncated at 60 KB. Requires [tracing](#read-agent-traces) to be enabled |
206| `OTEL_LOG_RAW_API_BODIES` | Full Anthropic Messages API request and response JSON as `claude_code.api_request_body` and `claude_code.api_response_body` log events. Set to `1` for inline bodies truncated at 60 KB, or `file:<dir>` for untruncated bodies on disk with a `body_ref` path in the event. Bodies include the entire conversation history and have extended-thinking content redacted. Enabling this implies consent to everything the three variables above would reveal |
207
208Leave these unset unless your observability pipeline is approved to store the data your agent handles. See [Security and privacy](/en/monitoring-usage#security-and-privacy) in the Monitoring reference for the full list of attributes and redaction behavior.
209
210## Related documentation
211
212These guides cover adjacent topics for monitoring and deploying agents:
213
214* [Track cost and usage](/en/agent-sdk/cost-tracking): read token and cost data from the message stream without an external backend.
215* [Hosting the Agent SDK](/en/agent-sdk/hosting): deploy agents in containers where you can set OpenTelemetry variables at the environment level.
216* [Monitoring](/en/monitoring-usage): the complete reference for every environment variable, metric, and event the CLI emits.