use-cases/iterate-on-difficult-problems diff

use-cases/iterate-on-difficult-problems.md +53 −4

Details

1# Iterate on difficult problems | Codex use cases1# Iterate on difficult problems | Codex use cases

2 2

3Codex use cases

5![](/assets/OpenAI-black-wordmark.svg)

7![Codex](/assets/OAI_Codex-Lockup_Fallback_Black.svg)

9Codex use case

11# Iterate on difficult problems

13Use Codex as a scored improvement loop to solve hard tasks.

15Difficulty **Advanced**

17Time horizon **Long-running**

19Give Codex an evaluation system, such as scripts and reviewable artifacts, so it can keep improving a hard task until the scores are good enough.

21## Best for

23- Problems where each iteration can be scored, but the best result usually takes many passes

24- Tasks with visual or subjective outputs that need both deterministic checks and an LLM-as-a-judge score

25- Long-running Codex sessions where you want progress tracked clearly instead of relying on context

27# Contents

3[← All use cases](https://developers.openai.com/codex/use-cases)29[← All use cases](https://developers.openai.com/codex/use-cases)

4 30

31Copy page [Export as PDF](https://developers.openai.com/codex/use-cases/iterate-on-difficult-problems/?export=pdf)

5Give Codex an evaluation system, such as scripts and reviewable artifacts, so it can keep improving a hard task until the scores are good enough.33Give Codex an evaluation system, such as scripts and reviewable artifacts, so it can keep improving a hard task until the scores are good enough.

6 34

7Advanced35Advanced

20 48

21## Starter prompt49## Starter prompt

22 50

51I have a difficult task in this workspace and I want you to run it as an eval-driven improvement loop.

52 Before changing anything:

53 - Read `AGENTS.md`.

54 - Find the script or command that scores the current output.

55 Iteration loop:

56 - Make one focused improvement at a time.

57 - Re-run the eval command after each meaningful change.

58 - Log the scores and what changed.

59- Inspect generated artifacts directly. If the output is visual, use `view\_image`.

60 - Keep going until both the overall score and the LLM average are above 90%.

61 Constraints:

62 - Do not stop at the first acceptable result.

63- Do not revert to an earlier version unless the new result is clearly worse in scores or artifacts.

64- If the eval improves but is still below target, explain the bottleneck and continue.

65 Output:

66 - current best scores

67 - log of major iterations

68 - remaining risks or weak spots

70[Open in the Codex app](codex://new?prompt=I+have+a+difficult+task+in+this+workspace+and+I+want+you+to+run+it+as+an+eval-driven+improvement+loop.%0A%0ABefore+changing+anything%3A%0A-+Read+%60AGENTS.md%60.%0A-+Find+the+script+or+command+that+scores+the+current+output.%0A%0AIteration+loop%3A%0A-+Make+one+focused+improvement+at+a+time.%0A-+Re-run+the+eval+command+after+each+meaningful+change.%0A-+Log+the+scores+and+what+changed.%0A-+Inspect+generated+artifacts+directly.+If+the+output+is+visual%2C+use+%60view_image%60.%0A-+Keep+going+until+both+the+overall+score+and+the+LLM+average+are+above+90%25.%0A%0AConstraints%3A%0A-+Do+not+stop+at+the+first+acceptable+result.%0A-+Do+not+revert+to+an+earlier+version+unless+the+new+result+is+clearly+worse+in+scores+or+artifacts.%0A-+If+the+eval+improves+but+is+still+below+target%2C+explain+the+bottleneck+and+continue.%0A%0AOutput%3A%0A-+current+best+scores%0A-+log+of+major+iterations%0A-+remaining+risks+or+weak+spots "Open in the Codex app")

23I have a difficult task in this workspace and I want you to run it as an eval-driven improvement loop.72I have a difficult task in this workspace and I want you to run it as an eval-driven improvement loop.

24 Before changing anything:73 Before changing anything:

25 - Read `AGENTS.md`.74 - Read `AGENTS.md`.

127 176

128Use Codex to turn a game brief into first a well-defined plan, and then a real browser-based...177Use Codex to turn a game brief into first a well-defined plan, and then a real browser-based...

129 178

130Engineering Code](https://developers.openai.com/codex/use-cases/browser-games)[![](/images/codex/codex-wallpaper-2.webp)179Engineering Code](https://developers.openai.com/codex/use-cases/browser-games)[![](/images/codex/codex-wallpaper-1.webp)

131 180

132### Analyze datasets and ship reports181### Learn a new concept

133 182

134Use Codex to clean data, join sources, explore hypotheses, model results, and package the...183Use Codex to study material such as research papers or courses, split the reading across...

135 184

136Data Analysis](https://developers.openai.com/codex/use-cases/datasets-and-reports)185Knowledge Work Data](https://developers.openai.com/codex/use-cases/learn-a-new-concept)

use-cases/iterate-on-difficult-problems.md Codex Docs, 2026-03-28 06:26 UTC → 2026-04-25 00:42 UTC