2 2
3[← All use cases](https://developers.openai.com/codex/use-cases)3[← All use cases](https://developers.openai.com/codex/use-cases)
4 4
5Copy page [Export as PDF](https://developers.openai.com/codex/use-cases/iterate-on-difficult-problems/?export=pdf)
6
5Give Codex an evaluation system, such as scripts and reviewable artifacts, so it can keep improving a hard task until the scores are good enough.7Give Codex an evaluation system, such as scripts and reviewable artifacts, so it can keep improving a hard task until the scores are good enough.
6 8
7Advanced9Advanced
20 22
21## Starter prompt23## Starter prompt
22 24
25I have a difficult task in this workspace and I want you to run it as an eval-driven improvement loop.
26 Before changing anything:
27 - Read `AGENTS.md`.
28 - Find the script or command that scores the current output.
29 Iteration loop:
30 - Make one focused improvement at a time.
31 - Re-run the eval command after each meaningful change.
32 - Log the scores and what changed.
33- Inspect generated artifacts directly. If the output is visual, use `view\_image`.
34 - Keep going until both the overall score and the LLM average are above 90%.
35 Constraints:
36 - Do not stop at the first acceptable result.
37- Do not revert to an earlier version unless the new result is clearly worse in scores or artifacts.
38- If the eval improves but is still below target, explain the bottleneck and continue.
39 Output:
40 - current best scores
41 - log of major iterations
42 - remaining risks or weak spots
43
23I have a difficult task in this workspace and I want you to run it as an eval-driven improvement loop.44I have a difficult task in this workspace and I want you to run it as an eval-driven improvement loop.
24 Before changing anything:45 Before changing anything:
25 - Read `AGENTS.md`.46 - Read `AGENTS.md`.