use-cases/iterate-on-difficult-problems diff

use-cases/iterate-on-difficult-problems.md +28 −0

Details

1# Iterate on difficult problems | Codex use cases1# Iterate on difficult problems | Codex use cases

2 2

3Codex use cases

5![](/assets/OpenAI-black-wordmark.svg)

7![Codex](/assets/OAI_Codex-Lockup_Fallback_Black.svg)

9Codex use case

11# Iterate on difficult problems

13Use Codex as a scored improvement loop to solve hard tasks.

15Difficulty **Advanced**

17Time horizon **Long-running**

19Give Codex an evaluation system, such as scripts and reviewable artifacts, so it can keep improving a hard task until the scores are good enough.

21## Best for

23- Problems where each iteration can be scored, but the best result usually takes many passes

24- Tasks with visual or subjective outputs that need both deterministic checks and an LLM-as-a-judge score

25- Long-running Codex sessions where you want progress tracked clearly instead of relying on context

27# Contents

3[← All use cases](https://developers.openai.com/codex/use-cases)29[← All use cases](https://developers.openai.com/codex/use-cases)

4 30

5Copy page [Export as PDF](https://developers.openai.com/codex/use-cases/iterate-on-difficult-problems/?export=pdf)31Copy page [Export as PDF](https://developers.openai.com/codex/use-cases/iterate-on-difficult-problems/?export=pdf)

41 - log of major iterations67 - log of major iterations

42 - remaining risks or weak spots68 - remaining risks or weak spots

43 69

70[Open in the Codex app](codex://new?prompt=I+have+a+difficult+task+in+this+workspace+and+I+want+you+to+run+it+as+an+eval-driven+improvement+loop.%0A%0ABefore+changing+anything%3A%0A-+Read+%60AGENTS.md%60.%0A-+Find+the+script+or+command+that+scores+the+current+output.%0A%0AIteration+loop%3A%0A-+Make+one+focused+improvement+at+a+time.%0A-+Re-run+the+eval+command+after+each+meaningful+change.%0A-+Log+the+scores+and+what+changed.%0A-+Inspect+generated+artifacts+directly.+If+the+output+is+visual%2C+use+%60view_image%60.%0A-+Keep+going+until+both+the+overall+score+and+the+LLM+average+are+above+90%25.%0A%0AConstraints%3A%0A-+Do+not+stop+at+the+first+acceptable+result.%0A-+Do+not+revert+to+an+earlier+version+unless+the+new+result+is+clearly+worse+in+scores+or+artifacts.%0A-+If+the+eval+improves+but+is+still+below+target%2C+explain+the+bottleneck+and+continue.%0A%0AOutput%3A%0A-+current+best+scores%0A-+log+of+major+iterations%0A-+remaining+risks+or+weak+spots "Open in the Codex app")

44I have a difficult task in this workspace and I want you to run it as an eval-driven improvement loop.72I have a difficult task in this workspace and I want you to run it as an eval-driven improvement loop.

45 Before changing anything:73 Before changing anything:

46 - Read `AGENTS.md`.74 - Read `AGENTS.md`.

use-cases/iterate-on-difficult-problems.md Codex Docs, 2026-04-15 06:44 UTC → 2026-04-24 12:28 UTC