use-cases/iterate-on-difficult-problems diff

use-cases/iterate-on-difficult-problems.md +21 −0

Details

2 2

3[← All use cases](https://developers.openai.com/codex/use-cases)3[← All use cases](https://developers.openai.com/codex/use-cases)

4 4

5Copy page [Export as PDF](https://developers.openai.com/codex/use-cases/iterate-on-difficult-problems/?export=pdf)

5Give Codex an evaluation system, such as scripts and reviewable artifacts, so it can keep improving a hard task until the scores are good enough.7Give Codex an evaluation system, such as scripts and reviewable artifacts, so it can keep improving a hard task until the scores are good enough.

6 8

7Advanced9Advanced

20 22

21## Starter prompt23## Starter prompt

22 24

25I have a difficult task in this workspace and I want you to run it as an eval-driven improvement loop.

26 Before changing anything:

27 - Read `AGENTS.md`.

28 - Find the script or command that scores the current output.

29 Iteration loop:

30 - Make one focused improvement at a time.

31 - Re-run the eval command after each meaningful change.

32 - Log the scores and what changed.

33- Inspect generated artifacts directly. If the output is visual, use `view\_image`.

34 - Keep going until both the overall score and the LLM average are above 90%.

35 Constraints:

36 - Do not stop at the first acceptable result.

37- Do not revert to an earlier version unless the new result is clearly worse in scores or artifacts.

38- If the eval improves but is still below target, explain the bottleneck and continue.

39 Output:

40 - current best scores

41 - log of major iterations

42 - remaining risks or weak spots

23I have a difficult task in this workspace and I want you to run it as an eval-driven improvement loop.44I have a difficult task in this workspace and I want you to run it as an eval-driven improvement loop.

24 Before changing anything:45 Before changing anything:

25 - Read `AGENTS.md`.46 - Read `AGENTS.md`.

use-cases/iterate-on-difficult-problems.md Codex Docs, 2026-04-12 06:38 UTC → 2026-04-15 06:44 UTC