use-cases/iterate-on-difficult-problems diff

use-cases/iterate-on-difficult-problems.md +25 −4

Details

2 2

3[← All use cases](https://developers.openai.com/codex/use-cases)3[← All use cases](https://developers.openai.com/codex/use-cases)

4 4

5Copy page [Export as PDF](https://developers.openai.com/codex/use-cases/iterate-on-difficult-problems/?export=pdf)

5Give Codex an evaluation system, such as scripts and reviewable artifacts, so it can keep improving a hard task until the scores are good enough.7Give Codex an evaluation system, such as scripts and reviewable artifacts, so it can keep improving a hard task until the scores are good enough.

6 8

7Advanced9Advanced

20 22

21## Starter prompt23## Starter prompt

22 24

25I have a difficult task in this workspace and I want you to run it as an eval-driven improvement loop.

26 Before changing anything:

27 - Read `AGENTS.md`.

28 - Find the script or command that scores the current output.

29 Iteration loop:

30 - Make one focused improvement at a time.

31 - Re-run the eval command after each meaningful change.

32 - Log the scores and what changed.

33- Inspect generated artifacts directly. If the output is visual, use `view\_image`.

34 - Keep going until both the overall score and the LLM average are above 90%.

35 Constraints:

36 - Do not stop at the first acceptable result.

37- Do not revert to an earlier version unless the new result is clearly worse in scores or artifacts.

38- If the eval improves but is still below target, explain the bottleneck and continue.

39 Output:

40 - current best scores

41 - log of major iterations

42 - remaining risks or weak spots

23I have a difficult task in this workspace and I want you to run it as an eval-driven improvement loop.44I have a difficult task in this workspace and I want you to run it as an eval-driven improvement loop.

24 Before changing anything:45 Before changing anything:

25 - Read `AGENTS.md`.46 - Read `AGENTS.md`.

127 148

128Use Codex to turn a game brief into first a well-defined plan, and then a real browser-based...149Use Codex to turn a game brief into first a well-defined plan, and then a real browser-based...

129 150

130Engineering Code](https://developers.openai.com/codex/use-cases/browser-games)[![](/images/codex/codex-wallpaper-2.webp)151Engineering Code](https://developers.openai.com/codex/use-cases/browser-games)[![](/images/codex/codex-wallpaper-1.webp)

131 152

132### Analyze datasets and ship reports153### Learn a new concept

133 154

134Use Codex to clean data, join sources, explore hypotheses, model results, and package the...155Use Codex to study material such as research papers or courses, split the reading across...

135 156

136Data Analysis](https://developers.openai.com/codex/use-cases/datasets-and-reports)157Knowledge Work Data](https://developers.openai.com/codex/use-cases/learn-a-new-concept)

use-cases/iterate-on-difficult-problems.md Codex Docs, 2026-04-07 00:40 UTC → 2026-04-15 06:44 UTC