use-cases/iterate-on-difficult-problems diff

use-cases/iterate-on-difficult-problems.md +47 −0

Details

1# Iterate on difficult problems | Codex use cases1# Iterate on difficult problems | Codex use cases

2 2

3Codex use cases

5![](/assets/OpenAI-black-wordmark.svg)

7![Codex](/assets/OAI_Codex-Lockup_Fallback_Black.svg)

9Codex use case

11# Iterate on difficult problems

13Use Codex as a scored improvement loop to solve hard tasks.

15Difficulty **Advanced**

17Time horizon **Long-running**

19Give Codex an evaluation system, such as scripts and reviewable artifacts, so it can keep improving a hard task until the scores are good enough.

21## Best for

23- Problems where each iteration can be scored, but the best result usually takes many passes

24- Tasks with visual or subjective outputs that need both deterministic checks and an LLM-as-a-judge score

25- Long-running Codex sessions where you want progress tracked clearly instead of relying on context

27# Contents

3[← All use cases](https://developers.openai.com/codex/use-cases)29[← All use cases](https://developers.openai.com/codex/use-cases)

4 30

31Copy page [Export as PDF](https://developers.openai.com/codex/use-cases/iterate-on-difficult-problems/?export=pdf)

5Give Codex an evaluation system, such as scripts and reviewable artifacts, so it can keep improving a hard task until the scores are good enough.33Give Codex an evaluation system, such as scripts and reviewable artifacts, so it can keep improving a hard task until the scores are good enough.

6 34

7Advanced35Advanced

20 48

21## Starter prompt49## Starter prompt

22 50

51I have a difficult task in this workspace and I want you to run it as an eval-driven improvement loop.

52 Before changing anything:

53 - Read `AGENTS.md`.

54 - Find the script or command that scores the current output.

55 Iteration loop:

56 - Make one focused improvement at a time.

57 - Re-run the eval command after each meaningful change.

58 - Log the scores and what changed.

59- Inspect generated artifacts directly. If the output is visual, use `view\_image`.

60 - Keep going until both the overall score and the LLM average are above 90%.

61 Constraints:

62 - Do not stop at the first acceptable result.

63- Do not revert to an earlier version unless the new result is clearly worse in scores or artifacts.

64- If the eval improves but is still below target, explain the bottleneck and continue.

65 Output:

66 - current best scores

67 - log of major iterations

68 - remaining risks or weak spots

23I have a difficult task in this workspace and I want you to run it as an eval-driven improvement loop.70I have a difficult task in this workspace and I want you to run it as an eval-driven improvement loop.

24 Before changing anything:71 Before changing anything:

25 - Read `AGENTS.md`.72 - Read `AGENTS.md`.

use-cases/iterate-on-difficult-problems.md Codex Docs, 2026-04-13 00:44 UTC → 2026-04-16 00:46 UTC