1# Iterate on difficult problems | Codex use cases1# Iterate on difficult problems | Codex use cases
2 2
3Codex use cases
4
5
6
7
8
9Codex use case
10
11# Iterate on difficult problems
12
13Use Codex as a scored improvement loop to solve hard tasks.
14
15Difficulty **Advanced**
16
17Time horizon **Long-running**
18
19Give Codex an evaluation system, such as scripts and reviewable artifacts, so it can keep improving a hard task until the scores are good enough.
20
21## Best for
22
23- Problems where each iteration can be scored, but the best result usually takes many passes
24- Tasks with visual or subjective outputs that need both deterministic checks and an LLM-as-a-judge score
25- Long-running Codex sessions where you want progress tracked clearly instead of relying on context
26
27# Contents
28
3[← All use cases](https://developers.openai.com/codex/use-cases)29[← All use cases](https://developers.openai.com/codex/use-cases)
4 30
31Copy page [Export as PDF](https://developers.openai.com/codex/use-cases/iterate-on-difficult-problems/?export=pdf)
32
5Give Codex an evaluation system, such as scripts and reviewable artifacts, so it can keep improving a hard task until the scores are good enough.33Give Codex an evaluation system, such as scripts and reviewable artifacts, so it can keep improving a hard task until the scores are good enough.
6 34
7Advanced35Advanced
20 48
21## Starter prompt49## Starter prompt
22 50
51I have a difficult task in this workspace and I want you to run it as an eval-driven improvement loop.
52 Before changing anything:
53 - Read `AGENTS.md`.
54 - Find the script or command that scores the current output.
55 Iteration loop:
56 - Make one focused improvement at a time.
57 - Re-run the eval command after each meaningful change.
58 - Log the scores and what changed.
59- Inspect generated artifacts directly. If the output is visual, use `view\_image`.
60 - Keep going until both the overall score and the LLM average are above 90%.
61 Constraints:
62 - Do not stop at the first acceptable result.
63- Do not revert to an earlier version unless the new result is clearly worse in scores or artifacts.
64- If the eval improves but is still below target, explain the bottleneck and continue.
65 Output:
66 - current best scores
67 - log of major iterations
68 - remaining risks or weak spots
69
23I have a difficult task in this workspace and I want you to run it as an eval-driven improvement loop.70I have a difficult task in this workspace and I want you to run it as an eval-driven improvement loop.
24 Before changing anything:71 Before changing anything:
25 - Read `AGENTS.md`.72 - Read `AGENTS.md`.