SpyBara
Go Premium

Documentation 2026-06-15 23:02 UTC to 2026-06-16 21:57 UTC

3 files changed +91 −1. View all changes and history on the product overview
2026
Wed 24 22:02 Tue 23 15:59 Mon 22 22:58 Tue 16 21:57 Mon 15 23:02 Fri 12 19:02 Thu 11 08:59 Wed 10 15:48 Tue 9 06:34 Fri 5 06:45 Thu 4 06:52 Wed 3 06:53 Tue 2 06:51 Mon 1 06:53
Details

152 152 

153The assistant might say something like:153The assistant might say something like:

154 154 

155Assistant commentary message

156 

157```json

158{

159 "role": "assistant",

160 "phase": "commentary",

161 "content": "I'm checking the logs and comparing them to the last successful deploy."

162}

163```

164 

165 

155That is not the answer. It is a progress note. Later, the assistant might say:166That is not the answer. It is a progress note. Later, the assistant might say:

156 167 

168Assistant final answer message

169 

170```json

171{

172 "role": "assistant",

173 "phase": "final_answer",

174 "content": "The deploy failed because the migration referenced a column that does not exist in production."

175}

176```

177 

178 

157This is useful in long-running or tool-heavy workflows where the assistant may179This is useful in long-running or tool-heavy workflows where the assistant may

158produce visible progress updates before it finishes. When you send that history180produce visible progress updates before it finishes. When you send that history

159back to the model, preserve `phase` on assistant messages so the model can tell181back to the model, preserve `phase` on assistant messages so the model can tell

Details

473 473 

474 474 

475 475 

476 Use the following guidance to choose a detail level:476 ```plain

477{

478 "type": "input_image",

479 "image_url": "https://api.nga.gov/iiif/a2e6da57-3cd1-4235-b20e-95dcaefed6c8/full/!800,800/0/default.jpg",

480 "detail": "original"

481}

482```

483 

484 

485 

486Use the following guidance to choose a detail level:

477 487 

478| Detail level | Best for |488| Detail level | Best for |

479| ------------ | ---------------------------------------------------------------------------------------------------------------------------------------------- |489| ------------ | ---------------------------------------------------------------------------------------------------------------------------------------------- |

Details

313. The session connects over WebRTC in the browser or WebSocket on the server.313. The session connects over WebRTC in the browser or WebSocket on the server.

324. The agent handles audio turns, tools, interruptions, and handoffs inside that session.324. The agent handles audio turns, tools, interruptions, and handoffs inside that session.

33 33 

34Start a realtime voice session

35 

36```typescript

37import { RealtimeAgent, RealtimeSession } from "@openai/agents/realtime";

38 

39const agent = new RealtimeAgent({

40 name: "Assistant",

41 instructions: "You are a helpful voice assistant.",

42});

43 

44const session = new RealtimeSession(agent, {

45 model: "gpt-realtime-2",

46});

47 

48await session.connect({

49 apiKey: "ek_...(ephemeral key from your server)",

50});

51```

52 

53 

34From there, attach tools, handoffs, and guardrails to the `RealtimeAgent` the same way you would attach them to a text agent. Keep audio transport concerns in the session layer, and keep business logic in the agent definition.54From there, attach tools, handoffs, and guardrails to the `RealtimeAgent` the same way you would attach them to a text agent. Keep audio transport concerns in the session layer, and keep business logic in the agent definition.

35 55 

36Start with the transport docs when you need lower-level control:56Start with the transport docs when you need lower-level control:


49 69 

50This is often the better fit for support flows, approval-heavy flows, or cases where you want durable transcripts and deterministic logic between each stage.70This is often the better fit for support flows, approval-heavy flows, or cases where you want durable transcripts and deterministic logic between each stage.

51 71 

72Run a chained voice pipeline

73 

74```python

75import asyncio

76import numpy as np

77 

78from agents import Agent, function_tool

79from agents.voice import AudioInput, SingleAgentVoiceWorkflow, VoicePipeline

80 

81 

82@function_tool

83def get_weather(city: str) -> str:

84 """Get the weather for a given city."""

85 return f"The weather in {city} is sunny."

86 

87 

88agent = Agent(

89 name="Assistant",

90 instructions="You are a helpful voice assistant.",

91 model="gpt-5.5",

92 tools=[get_weather],

93)

94 

95 

96async def main() -> None:

97 pipeline = VoicePipeline(workflow=SingleAgentVoiceWorkflow(agent))

98 audio_input = AudioInput(buffer=np.zeros(24000 * 3, dtype=np.int16))

99 result = await pipeline.run(audio_input)

100 async for event in result.stream():

101 if event.type == "voice_stream_event_audio":

102 print("Received audio bytes", len(event.data))

103 

104 

105if __name__ == "__main__":

106 asyncio.run(main())

107```

108 

109 

52Use this path when each stage needs to be visible or replaceable. For example, you might store the transcript, run policy checks before the text agent responds, call internal systems, then generate speech only after the workflow reaches an approved answer.110Use this path when each stage needs to be visible or replaceable. For example, you might store the transcript, run policy checks before the text agent responds, call internal systems, then generate speech only after the workflow reaches an approved answer.

53 111 

54## Voice agents still use the same core agent building blocks112## Voice agents still use the same core agent building blocks