313. The session connects over WebRTC in the browser or WebSocket on the server.313. The session connects over WebRTC in the browser or WebSocket on the server.
324. The agent handles audio turns, tools, interruptions, and handoffs inside that session.324. The agent handles audio turns, tools, interruptions, and handoffs inside that session.
33 33
34Start a realtime voice session
35
36```typescript
37import { RealtimeAgent, RealtimeSession } from "@openai/agents/realtime";
38
39const agent = new RealtimeAgent({
40 name: "Assistant",
41 instructions: "You are a helpful voice assistant.",
42});
43
44const session = new RealtimeSession(agent, {
45 model: "gpt-realtime-2",
46});
47
48await session.connect({
49 apiKey: "ek_...(ephemeral key from your server)",
50});
51```
52
53
34From there, attach tools, handoffs, and guardrails to the `RealtimeAgent` the same way you would attach them to a text agent. Keep audio transport concerns in the session layer, and keep business logic in the agent definition.54From there, attach tools, handoffs, and guardrails to the `RealtimeAgent` the same way you would attach them to a text agent. Keep audio transport concerns in the session layer, and keep business logic in the agent definition.
35 55
36Start with the transport docs when you need lower-level control:56Start with the transport docs when you need lower-level control:
49 69
50This is often the better fit for support flows, approval-heavy flows, or cases where you want durable transcripts and deterministic logic between each stage.70This is often the better fit for support flows, approval-heavy flows, or cases where you want durable transcripts and deterministic logic between each stage.
51 71
72Run a chained voice pipeline
73
74```python
75import asyncio
76import numpy as np
77
78from agents import Agent, function_tool
79from agents.voice import AudioInput, SingleAgentVoiceWorkflow, VoicePipeline
80
81
82@function_tool
83def get_weather(city: str) -> str:
84 """Get the weather for a given city."""
85 return f"The weather in {city} is sunny."
86
87
88agent = Agent(
89 name="Assistant",
90 instructions="You are a helpful voice assistant.",
91 model="gpt-5.5",
92 tools=[get_weather],
93)
94
95
96async def main() -> None:
97 pipeline = VoicePipeline(workflow=SingleAgentVoiceWorkflow(agent))
98 audio_input = AudioInput(buffer=np.zeros(24000 * 3, dtype=np.int16))
99 result = await pipeline.run(audio_input)
100 async for event in result.stream():
101 if event.type == "voice_stream_event_audio":
102 print("Received audio bytes", len(event.data))
103
104
105if __name__ == "__main__":
106 asyncio.run(main())
107```
108
109
52Use this path when each stage needs to be visible or replaceable. For example, you might store the transcript, run policy checks before the text agent responds, call internal systems, then generate speech only after the workflow reaches an approved answer.110Use this path when each stage needs to be visible or replaceable. For example, you might store the transcript, run policy checks before the text agent responds, call internal systems, then generate speech only after the workflow reaches an approved answer.
53 111
54## Voice agents still use the same core agent building blocks112## Voice agents still use the same core agent building blocks