guides/trace-grading.md

Trace grading

Trace grading is the process of assigning structured scores or labels to an agent's trace—the end-to-end log of decisions, tool calls, and reasoning steps—to assess correctness, quality, or adherence to expectations. These annotations help identify where the agent did well or made mistakes, enabling targeted improvements in orchestration or behavior.

Trace evals use those graded traces to systematically evaluate agent performance across many examples, helping to benchmark changes, identify regressions, or validate improvements. Unlike black-box evaluations, trace evals provide more data to better understand why an agent succeeds or fails.

Use both features to track, analyze, and optimize the performance of groups of agents.

Get started with traces

In the dashboard, navigate to Logs > Traces.

Select a workflow. You'll see traces from SDK-based apps, and from existing Agent Builder workflows during the transition window.

Select a trace to inspect your workflow.

Create a grader, and run it to grade your agents' performance against grader criteria.

Trace grading is a valuable tool for error identification at scale, which is critical for building resilience into your AI applications. Learn more about our recommended process in our cookbook.

Evaluate traces with runs

Select Grade all. This takes you to the evaluation dashboard.

In the evaluation dashboard, add and edit test criteria.

Add a run to evaluate outputs. You can configure run options like model, date range, and tool calls to get more specificity in your eval.

Learn more about how you can use evals here.

guides/trace-grading.md +1 −1

9## Get started with traces9## Get started with traces

10 10

111. In the dashboard, navigate to Logs > [Traces](https://platform.openai.com/logs?api=traces).111. In the dashboard, navigate to Logs > [Traces](https://platform.openai.com/logs?api=traces).

~~121. Select a worfklow. You'll see logs from any workflows you created in [Agent Builder](https://developers.openai.com/api/docs/guides/agent-builder).~~121. Select a workflow. You'll see traces from SDK-based apps, and from existing [Agent Builder](https://developers.openai.com/api/docs/guides/agent-builder) workflows during the transition window.

131. Select a trace to inspect your workflow.131. Select a trace to inspect your workflow.

141. Create a grader, and run it to grade your agents' performance against grader criteria.141. Create a grader, and run it to grade your agents' performance against grader criteria.

15 15

guides/trace-grading.md 2026-06-04 06:52 UTC to 2026-06-05 06:45 UTC

Trace grading

Get started with traces

Evaluate traces with runs