SpyBara
Go Premium Account
2026
15 Apr 2026, 06:44
14 May 2026, 21:00 14 May 2026, 07:00 13 May 2026, 00:57 12 May 2026, 01:59 11 May 2026, 18:00 7 May 2026, 20:02 7 May 2026, 17:08 5 May 2026, 23:00 2 May 2026, 06:45 2 May 2026, 00:48 1 May 2026, 18:29 30 Apr 2026, 18:36 29 Apr 2026, 12:40 29 Apr 2026, 00:50 25 Apr 2026, 06:37 25 Apr 2026, 00:42 24 Apr 2026, 18:20 24 Apr 2026, 12:28 23 Apr 2026, 18:31 23 Apr 2026, 12:28 23 Apr 2026, 00:46 22 Apr 2026, 18:29 22 Apr 2026, 00:42 21 Apr 2026, 18:29 21 Apr 2026, 12:30 21 Apr 2026, 06:45 20 Apr 2026, 18:26 20 Apr 2026, 06:53 18 Apr 2026, 18:18 17 Apr 2026, 00:44 16 Apr 2026, 18:31 16 Apr 2026, 00:46 15 Apr 2026, 18:31 15 Apr 2026, 06:44 14 Apr 2026, 18:31 14 Apr 2026, 12:29 13 Apr 2026, 18:37 13 Apr 2026, 00:44 12 Apr 2026, 06:38 10 Apr 2026, 18:23 9 Apr 2026, 00:33 8 Apr 2026, 18:32 8 Apr 2026, 00:40 7 Apr 2026, 00:40 2 Apr 2026, 18:23 31 Mar 2026, 06:35 31 Mar 2026, 00:39 28 Mar 2026, 06:26 28 Mar 2026, 00:36 27 Mar 2026, 18:23 27 Mar 2026, 00:39 26 Mar 2026, 18:27 25 Mar 2026, 18:24 23 Mar 2026, 18:22 20 Mar 2026, 00:35 18 Mar 2026, 12:23 18 Mar 2026, 00:36 17 Mar 2026, 18:24 17 Mar 2026, 00:33 16 Mar 2026, 18:25 16 Mar 2026, 12:23 14 Mar 2026, 00:32 13 Mar 2026, 18:15 13 Mar 2026, 00:34 11 Mar 2026, 00:31 9 Mar 2026, 00:34 8 Mar 2026, 18:10 8 Mar 2026, 00:35 7 Mar 2026, 18:10 7 Mar 2026, 06:14 7 Mar 2026, 00:33 6 Mar 2026, 00:38 5 Mar 2026, 18:41 5 Mar 2026, 06:22 5 Mar 2026, 00:34 4 Mar 2026, 18:18 4 Mar 2026, 06:20 3 Mar 2026, 18:20 3 Mar 2026, 00:35 27 Feb 2026, 18:15 24 Feb 2026, 06:27 24 Feb 2026, 00:33 23 Feb 2026, 18:27 21 Feb 2026, 00:33 20 Feb 2026, 12:16 19 Feb 2026, 20:53 19 Feb 2026, 20:37
7 May 2026, 20:02
14 May 2026, 21:00 14 May 2026, 07:00 13 May 2026, 00:57 12 May 2026, 01:59 11 May 2026, 18:00 7 May 2026, 20:02 7 May 2026, 17:08 5 May 2026, 23:00 2 May 2026, 06:45 2 May 2026, 00:48 1 May 2026, 18:29 30 Apr 2026, 18:36 29 Apr 2026, 12:40 29 Apr 2026, 00:50 25 Apr 2026, 06:37 25 Apr 2026, 00:42 24 Apr 2026, 18:20 24 Apr 2026, 12:28 23 Apr 2026, 18:31 23 Apr 2026, 12:28 23 Apr 2026, 00:46 22 Apr 2026, 18:29 22 Apr 2026, 00:42 21 Apr 2026, 18:29 21 Apr 2026, 12:30 21 Apr 2026, 06:45 20 Apr 2026, 18:26 20 Apr 2026, 06:53 18 Apr 2026, 18:18 17 Apr 2026, 00:44 16 Apr 2026, 18:31 16 Apr 2026, 00:46 15 Apr 2026, 18:31 15 Apr 2026, 06:44 14 Apr 2026, 18:31 14 Apr 2026, 12:29 13 Apr 2026, 18:37 13 Apr 2026, 00:44 12 Apr 2026, 06:38 10 Apr 2026, 18:23 9 Apr 2026, 00:33 8 Apr 2026, 18:32 8 Apr 2026, 00:40 7 Apr 2026, 00:40 2 Apr 2026, 18:23 31 Mar 2026, 06:35 31 Mar 2026, 00:39 28 Mar 2026, 06:26 28 Mar 2026, 00:36 27 Mar 2026, 18:23 27 Mar 2026, 00:39 26 Mar 2026, 18:27 25 Mar 2026, 18:24 23 Mar 2026, 18:22 20 Mar 2026, 00:35 18 Mar 2026, 12:23 18 Mar 2026, 00:36 17 Mar 2026, 18:24 17 Mar 2026, 00:33 16 Mar 2026, 18:25 16 Mar 2026, 12:23 14 Mar 2026, 00:32 13 Mar 2026, 18:15 13 Mar 2026, 00:34 11 Mar 2026, 00:31 9 Mar 2026, 00:34 8 Mar 2026, 18:10 8 Mar 2026, 00:35 7 Mar 2026, 18:10 7 Mar 2026, 06:14 7 Mar 2026, 00:33 6 Mar 2026, 00:38 5 Mar 2026, 18:41 5 Mar 2026, 06:22 5 Mar 2026, 00:34 4 Mar 2026, 18:18 4 Mar 2026, 06:20 3 Mar 2026, 18:20 3 Mar 2026, 00:35 27 Feb 2026, 18:15 24 Feb 2026, 06:27 24 Feb 2026, 00:33 23 Feb 2026, 18:27 21 Feb 2026, 00:33 20 Feb 2026, 12:16 19 Feb 2026, 20:53 19 Feb 2026, 20:37
Fri 1 18:29 Sat 2 00:48 Sat 2 06:45 Tue 5 23:00 Thu 7 17:08 Thu 7 20:02 Mon 11 18:00 Tue 12 01:59 Wed 13 00:57 Thu 14 07:00 Thu 14 21:00

After 2026-05-02 06:45 UTC, this monitor no longer uses markdownified HTML/MDX. Comparisons across that boundary can therefore show more extensive diffs.

Details

1# Iterate on difficult problems | Codex use cases1---

2name: Iterate on difficult problems

3tagline: Use Codex as a scored improvement loop to solve hard tasks.

4summary: Give Codex an evaluation system, such as scripts and reviewable

5 artifacts, so it can keep improving a hard task until the scores are good

6 enough.

7bestFor:

8 - Problems where each iteration can be scored, but the best result usually

9 takes many passes

10 - Tasks with visual or subjective outputs that need both deterministic checks

11 and an LLM-as-a-judge score

12 - Long-running Codex sessions where you want progress tracked clearly instead

13 of relying on context

14starterPrompt:

15 title: Keep Iterating Until the Eval Passes

16 body: >-

17 I have a difficult task in this workspace and I want you to run it as an

18 eval-driven improvement loop.

2 19 

3[← All use cases](https://developers.openai.com/codex/use-cases)

4 20 

5Copy page [Export as PDF](https://developers.openai.com/codex/use-cases/iterate-on-difficult-problems/?export=pdf)21 Before changing anything:

6 22 

7Give Codex an evaluation system, such as scripts and reviewable artifacts, so it can keep improving a hard task until the scores are good enough.23 - Read `AGENTS.md`.

8 24 

9Advanced25 - Find the script or command that scores the current output.

10 26 

11Long-running

12 27 

13Related links28 Iteration loop:

14 29 

15[Custom instructions with AGENTS.md](https://developers.openai.com/codex/guides/agents-md) [Codex workflows](https://developers.openai.com/codex/workflows)30 - Make one focused improvement at a time.

16 31 

17## Best for32 - Re-run the eval command after each meaningful change.

18 33 

19- Problems where each iteration can be scored, but the best result usually takes many passes34 - Log the scores and what changed.

20- Tasks with visual or subjective outputs that need both deterministic checks and an LLM-as-a-judge score

21- Long-running Codex sessions where you want progress tracked clearly instead of relying on context

22 35 

23## Starter prompt36 - Inspect generated artifacts directly. If the output is visual, use

37 `view_image`.

24 38 

25I have a difficult task in this workspace and I want you to run it as an eval-driven improvement loop.

26 Before changing anything:

27 - Read `AGENTS.md`.

28 - Find the script or command that scores the current output.

29 Iteration loop:

30 - Make one focused improvement at a time.

31 - Re-run the eval command after each meaningful change.

32 - Log the scores and what changed.

33- Inspect generated artifacts directly. If the output is visual, use `view\_image`.

34 - Keep going until both the overall score and the LLM average are above 90%.39 - Keep going until both the overall score and the LLM average are above 90%.

35 Constraints:

36 - Do not stop at the first acceptable result.

37- Do not revert to an earlier version unless the new result is clearly worse in scores or artifacts.

38- If the eval improves but is still below target, explain the bottleneck and continue.

39 Output:

40 - current best scores

41 - log of major iterations

42 - remaining risks or weak spots

43 40 

44I have a difficult task in this workspace and I want you to run it as an eval-driven improvement loop.41 

45 Before changing anything:

46 - Read `AGENTS.md`.

47 - Find the script or command that scores the current output.

48 Iteration loop:

49 - Make one focused improvement at a time.

50 - Re-run the eval command after each meaningful change.

51 - Log the scores and what changed.

52- Inspect generated artifacts directly. If the output is visual, use `view\_image`.

53 - Keep going until both the overall score and the LLM average are above 90%.

54 Constraints:42 Constraints:

43 

55 - Do not stop at the first acceptable result.44 - Do not stop at the first acceptable result.

56- Do not revert to an earlier version unless the new result is clearly worse in scores or artifacts.45 

57- If the eval improves but is still below target, explain the bottleneck and continue.46 - Do not revert to an earlier version unless the new result is clearly worse

47 in scores or artifacts.

48 

49 - If the eval improves but is still below target, explain the bottleneck and

50 continue.

51 

52 

58 Output:53 Output:

54 

59 - current best scores55 - current best scores

56 

60 - log of major iterations57 - log of major iterations

58 

61 - remaining risks or weak spots59 - remaining risks or weak spots

60relatedLinks:

61 - label: Custom instructions with AGENTS.md

62 url: /codex/guides/agents-md

63 - label: Codex workflows

64 url: /codex/workflows

65---

62 66 

63## Introduction67## Introduction

64 68 


1336. Continue until the thresholds are met.1376. Continue until the thresholds are met.

134 138 

135This discipline matters. If each iteration changes too many things at once, Codex cannot tell which idea improved the score. If it skips logging, the session becomes hard to trust and hard to resume.139This discipline matters. If each iteration changes too many things at once, Codex cannot tell which idea improved the score. If it skips logging, the session becomes hard to trust and hard to resume.

136 

137## Related use cases

138 

139[![](/images/codex/codex-wallpaper-1.webp)

140 

141### Understand large codebases

142 

143Use Codex to map unfamiliar codebases, explain different modules and data flow, and point...

144 

145Engineering Analysis](https://developers.openai.com/codex/use-cases/codebase-onboarding)[![](/images/codex/codex-wallpaper-1.webp)

146 

147### Create browser-based games

148 

149Use Codex to turn a game brief into first a well-defined plan, and then a real browser-based...

150 

151Engineering Code](https://developers.openai.com/codex/use-cases/browser-games)[![](/images/codex/codex-wallpaper-1.webp)

152 

153### Learn a new concept

154 

155Use Codex to study material such as research papers or courses, split the reading across...

156 

157Knowledge Work Data](https://developers.openai.com/codex/use-cases/learn-a-new-concept)