SpyBara
Go Premium Account
2026
23 Apr 2026, 18:31
14 May 2026, 21:00 14 May 2026, 07:00 13 May 2026, 00:57 12 May 2026, 01:59 11 May 2026, 18:00 7 May 2026, 20:02 7 May 2026, 17:08 5 May 2026, 23:00 2 May 2026, 06:45 2 May 2026, 00:48 1 May 2026, 18:29 30 Apr 2026, 18:36 29 Apr 2026, 12:40 29 Apr 2026, 00:50 25 Apr 2026, 06:37 25 Apr 2026, 00:42 24 Apr 2026, 18:20 24 Apr 2026, 12:28 23 Apr 2026, 18:31 23 Apr 2026, 12:28 23 Apr 2026, 00:46 22 Apr 2026, 18:29 22 Apr 2026, 00:42 21 Apr 2026, 18:29 21 Apr 2026, 12:30 21 Apr 2026, 06:45 20 Apr 2026, 18:26 20 Apr 2026, 06:53 18 Apr 2026, 18:18 17 Apr 2026, 00:44 16 Apr 2026, 18:31 16 Apr 2026, 00:46 15 Apr 2026, 18:31 15 Apr 2026, 06:44 14 Apr 2026, 18:31 14 Apr 2026, 12:29 13 Apr 2026, 18:37 13 Apr 2026, 00:44 12 Apr 2026, 06:38 10 Apr 2026, 18:23 9 Apr 2026, 00:33 8 Apr 2026, 18:32 8 Apr 2026, 00:40 7 Apr 2026, 00:40 2 Apr 2026, 18:23 31 Mar 2026, 06:35 31 Mar 2026, 00:39 28 Mar 2026, 06:26 28 Mar 2026, 00:36 27 Mar 2026, 18:23 27 Mar 2026, 00:39 26 Mar 2026, 18:27 25 Mar 2026, 18:24 23 Mar 2026, 18:22 20 Mar 2026, 00:35 18 Mar 2026, 12:23 18 Mar 2026, 00:36 17 Mar 2026, 18:24 17 Mar 2026, 00:33 16 Mar 2026, 18:25 16 Mar 2026, 12:23 14 Mar 2026, 00:32 13 Mar 2026, 18:15 13 Mar 2026, 00:34 11 Mar 2026, 00:31 9 Mar 2026, 00:34 8 Mar 2026, 18:10 8 Mar 2026, 00:35 7 Mar 2026, 18:10 7 Mar 2026, 06:14 7 Mar 2026, 00:33 6 Mar 2026, 00:38 5 Mar 2026, 18:41 5 Mar 2026, 06:22 5 Mar 2026, 00:34 4 Mar 2026, 18:18 4 Mar 2026, 06:20 3 Mar 2026, 18:20 3 Mar 2026, 00:35 27 Feb 2026, 18:15 24 Feb 2026, 06:27 24 Feb 2026, 00:33 23 Feb 2026, 18:27 21 Feb 2026, 00:33 20 Feb 2026, 12:16 19 Feb 2026, 20:53 19 Feb 2026, 20:37
14 May 2026, 07:00
14 May 2026, 21:00 14 May 2026, 07:00 13 May 2026, 00:57 12 May 2026, 01:59 11 May 2026, 18:00 7 May 2026, 20:02 7 May 2026, 17:08 5 May 2026, 23:00 2 May 2026, 06:45 2 May 2026, 00:48 1 May 2026, 18:29 30 Apr 2026, 18:36 29 Apr 2026, 12:40 29 Apr 2026, 00:50 25 Apr 2026, 06:37 25 Apr 2026, 00:42 24 Apr 2026, 18:20 24 Apr 2026, 12:28 23 Apr 2026, 18:31 23 Apr 2026, 12:28 23 Apr 2026, 00:46 22 Apr 2026, 18:29 22 Apr 2026, 00:42 21 Apr 2026, 18:29 21 Apr 2026, 12:30 21 Apr 2026, 06:45 20 Apr 2026, 18:26 20 Apr 2026, 06:53 18 Apr 2026, 18:18 17 Apr 2026, 00:44 16 Apr 2026, 18:31 16 Apr 2026, 00:46 15 Apr 2026, 18:31 15 Apr 2026, 06:44 14 Apr 2026, 18:31 14 Apr 2026, 12:29 13 Apr 2026, 18:37 13 Apr 2026, 00:44 12 Apr 2026, 06:38 10 Apr 2026, 18:23 9 Apr 2026, 00:33 8 Apr 2026, 18:32 8 Apr 2026, 00:40 7 Apr 2026, 00:40 2 Apr 2026, 18:23 31 Mar 2026, 06:35 31 Mar 2026, 00:39 28 Mar 2026, 06:26 28 Mar 2026, 00:36 27 Mar 2026, 18:23 27 Mar 2026, 00:39 26 Mar 2026, 18:27 25 Mar 2026, 18:24 23 Mar 2026, 18:22 20 Mar 2026, 00:35 18 Mar 2026, 12:23 18 Mar 2026, 00:36 17 Mar 2026, 18:24 17 Mar 2026, 00:33 16 Mar 2026, 18:25 16 Mar 2026, 12:23 14 Mar 2026, 00:32 13 Mar 2026, 18:15 13 Mar 2026, 00:34 11 Mar 2026, 00:31 9 Mar 2026, 00:34 8 Mar 2026, 18:10 8 Mar 2026, 00:35 7 Mar 2026, 18:10 7 Mar 2026, 06:14 7 Mar 2026, 00:33 6 Mar 2026, 00:38 5 Mar 2026, 18:41 5 Mar 2026, 06:22 5 Mar 2026, 00:34 4 Mar 2026, 18:18 4 Mar 2026, 06:20 3 Mar 2026, 18:20 3 Mar 2026, 00:35 27 Feb 2026, 18:15 24 Feb 2026, 06:27 24 Feb 2026, 00:33 23 Feb 2026, 18:27 21 Feb 2026, 00:33 20 Feb 2026, 12:16 19 Feb 2026, 20:53 19 Feb 2026, 20:37
Fri 1 18:29 Sat 2 00:48 Sat 2 06:45 Tue 5 23:00 Thu 7 17:08 Thu 7 20:02 Mon 11 18:00 Tue 12 01:59 Wed 13 00:57 Thu 14 07:00 Thu 14 21:00

After 2026-05-02 06:45 UTC, this monitor no longer uses markdownified HTML/MDX. Comparisons across that boundary can therefore show more extensive diffs.

Details

1# Analyze datasets and ship reports | Codex use cases1---

2name: Analyze datasets and ship reports

3tagline: Turn messy data into clear analysis and visualizations.

4summary: Use Codex to clean data, join sources, explore hypotheses, model

5 results, and package the output as a reusable artifact.

6skills:

7 - token: $spreadsheet

8 description: Inspect CSV, TSV, and Excel files when formulas, exports, or quick

9 spreadsheet checks matter.

10 - token: $jupyter-notebook

11 url: https://github.com/openai/skills/tree/main/skills/.curated/jupyter-notebook

12 description: Create or refactor notebooks for exploratory analysis, experiments,

13 and reusable walkthroughs.

14 - token: $doc

15 url: https://github.com/openai/skills/tree/main/skills/.curated/doc

16 description: Produce stakeholder-ready `.docx` reports when layout, tables, or

17 comments matter.

18 - token: $pdf

19 url: https://github.com/openai/skills/tree/main/skills/.curated/pdf

20 description: Render PDF outputs and check the final analysis artifact before you

21 share it.

22bestFor:

23 - Data analysis that starts with messy files and should end with a chart,

24 memo, dashboard, or report

25 - Analysts who want Codex to help with cleanup, joins, exploratory analysis,

26 and reproducible scripts

27 - Teams that need reviewable artifacts instead of one-off notebook state

28starterPrompt:

29 title: Turn the Dataset Into a Reproducible Analysis

30 body: >-

31 I'm doing a data analysis project in this workspace.

2 32 

3Need

4 33 

5Analysis stack34 Goal:

6 35 

7Default options36 - Figure out whether houses near the highway have lower property valuations.

8 37 

9[pandas](https://pandas.pydata.org/) with [matplotlib](https://matplotlib.org/) or [seaborn](https://seaborn.pydata.org/)

10 38 

11Why it's needed39 Start by:

12 40 

13Good defaults for import, profiling, joins, cleaning, and the first round of charts.41 - reading `AGENTS.md` and explaining the recommended Python environment

42 

43 - loading the dataset(s) at [dataset path]

44 

45 - describing what each file contains, likely join keys, and obvious data

46 quality issues

47 

48 - proposing a reproducible workflow from import and tidy through

49 visualization, modeling, and report output

50 

51 

52 Constraints:

53 

54 - prefer scripts and saved artifacts over one-off notebook state

55 

56 - do not invent missing values or merge keys

57 

58 - suggest any skills or worktree splits that would make the workflow more

59 reproducible

60 

61 

62 Output:

63 

64 - setup plan

65 

66 - data inventory

67 

68 - analysis plan

69 

70 - first commands or files to create

71relatedLinks:

72 - label: Agent skills

73 url: /codex/skills

74 - label: Worktrees in the Codex app

75 url: /codex/app/worktrees

76techStack:

77 - need: Analysis stack

78 goodDefault: "[pandas](https://pandas.pydata.org/) with

79 [matplotlib](https://matplotlib.org/) or

80 [seaborn](https://seaborn.pydata.org/)"

81 why: Good defaults for import, profiling, joins, cleaning, and the first round

82 of charts.

83 - need: Modeling

84 goodDefault: "[statsmodels](https://www.statsmodels.org/) or

85 [scikit-learn](https://scikit-learn.org/stable/)"

86 why: Start with interpretable baselines before moving to more complex predictive

87 models.

88---

89 

90## Introduction

91 

92At its core, data analysis is about using data to inform decisions. The goal isn't analysis for its own sake. It's to produce an artifact that helps someone act: a chart for leadership, an experiment readout for a product team, a model evaluation for researchers, or a dashboard that guides daily operations.

93 

94A useful framework, popularized by _R for Data Science_, is a loop: import and tidy data, then iterate between transform, visualize, and model to build understanding before you communicate results. Programming surrounds that whole cycle.

95 

96Codex fits well into this workflow. It helps you move around the loop faster by cleaning data, exploring hypotheses, generating analyses, and producing reproducible artifacts. The target isn't a one-off notebook. The target is a workflow that other people can review, trust, and rerun.

97 

98## Define your use case

99 

100Choose one concrete question you want to answer with your data.

101 

102The more specific the question, the better. It will help Codex understand what you want to achieve and how to help you get there.

103 

104### Running example: Property values near the highway

105 

106As an example, we'll explore the following question:

107 

108> To what extent are houses near the highway lower in property valuation?

109 

110Suppose one dataset contains property values or sale prices, and another contains location, parcel, or highway-proximity information. The work isn't only to run a model. It's to make the inputs trustworthy, document the joins, pressure-test the result, and end with an artifact that somebody else can use.

111 

112## Set up the environment

113 

114When you start a new data analysis project, you need to set up the environment and define the rules of the project.

115 

116- **Environment:** Codex should know which Python environment, package manager, folders, and output conventions are canonical for the project.

117- **Skills:** Repeated workflows such as notebook cleanup, spreadsheet exports, or final report packaging should move into reusable skills instead of being re-explained in every prompt.

118- **Worktrees:** Separate explorations into separate worktrees so one hypothesis, merge strategy, or visualization branch doesn't bleed into another.

119 

120To learn more about how to install and use skills, see our [skills documentation](https://developers.openai.com/codex/skills).

121 

122### Guide Codex's behavior

123 

124Before touching the data, tell Codex how to behave in the repo. Put personal defaults in `~/.codex/AGENTS.md`, and put project rules in the repository `AGENTS.md`.

125 

126A small `AGENTS.md` is often enough:

127 

128```md

129## Data analysis defaults

130 

131- Use `uv run` or the project's existing Python environment.

132- Keep source data in `data/raw/` and write cleaned data to `data/processed/`.

133- Put exploratory notebooks in `analysis/` and final artifacts in `output/`.

134- Never overwrite raw files.

135- Prefer scripts or checked-in notebooks over unnamed scratch cells.

136- Before merging datasets, report candidate keys, null rates, and join coverage.

137```

138 

139If the repo doesn't already define a Python environment, ask Codex to create a reproducible setup and explain how to run it. For data analysis work, that step matters more than jumping straight into charts.

140 

141## Import the data

142 

143Often the fastest way to start is to paste the file path and ask Codex to inspect it. This is where Codex helps you answer basic but important questions:

144 

145- What file formats are here?

146- What does each dataset seem to represent?

147- Which columns might be targets, identifiers, dates, locations, or measures?

148- Where are the clear quality issues?

149 

150Don't ask for conclusions yet. Ask for inventory and explanation first.

151 

152## Tidy and merge the inputs

153 

154Most real work starts here. You have two or more datasets, the primary key isn't clear, and a naive merge could lose data or create duplicates.

155 

156Ask Codex to profile the merge before performing it:

157 

158- Check uniqueness for candidate keys.

159- Measure null rates and formatting differences.

160- Normalize clear formatting issues such as casing, whitespace, or address formatting.

161- Run trial joins and report match rates.

162- Recommend the safest merge strategy before it writes the final merged file.

163 

164If you need to derive the best key, such as a normalized address, a parcel identifier built from a few columns, or a location join, make Codex explain the tradeoffs and edge cases before you accept the merge.

165 

166## Explore with charts and separate worktrees

167 

168Exploratory data analysis is where Codex benefits from clean isolation. One worktree can test address cleanup or feature engineering while another focuses on charts or alternate model directions. That keeps each diff reviewable and prevents one long thread from mixing incompatible ideas.

169 

170The Codex app includes built-in worktree support. If you are working in a terminal, plain Git worktrees work well too:

171 

172```bash

173git worktree add ../analysis-highway-eda -b analysis/highway-eda

174git worktree add ../analysis-model-comparison -b analysis/highway-modeling

175```

176 

177In the running example, this step is where you would compare homes near the highway against homes farther away, examine outliers, inspect missing-value patterns, and decide whether the observed effect looks real or reflects neighborhood composition, home size, or other factors.

178 

179## Model the question

180 

181Not every analysis needs a complex model. Start with an interpretable baseline.

182 

183For the highway question, a sensible first pass is a regression or other transparent model that estimates the relationship between highway proximity and property value while controlling for relevant factors such as size, age, and location.

184 

185Ask Codex to be explicit about:

186 

187- The target variable and feature definitions.

188- Which controls to include and why.

189- Leakage risks and exclusions.

190- How it chose the split, evaluation, or uncertainty estimate.

191- What the result means in plain language.

192 

193If the first model is weak, that's still useful. It tells you whether the problem is the model, the features, the join quality, or the question itself.

194 

195## Communicate the result

196 

197The analysis is only useful when someone else can consume it. Ask Codex to produce the artifact the audience needs:

198 

199- A Markdown memo for technical collaborators.

200- A spreadsheet or CSV for downstream operations work.

201- A `.docx` brief using `$doc` when formatting and tables matter.

202- A rendered appendix or final deliverable using `$pdf`.

203- A lightweight dashboard or static report site deployed with `$vercel-deploy`.

204 

205This is also where you ask for caveats. If the join quality is imperfect, sampling bias is present, or the model assumptions are fragile, Codex should say that plainly in the deliverable.

206 

207## Skills to consider

208 

209The curated skills that fit this workflow especially well are:

210 

211- `$spreadsheet` for CSV, TSV, and Excel editing or exports.

212- `$jupyter-notebook` when the deliverable should stay notebook-native.

213- `$doc` and `$pdf` for stakeholder-facing outputs.

214- `$vercel-deploy` when you want to share the result as a URL.

215 

216Once the workflow stabilizes, create repo-local skills for the repeated parts, such as `refresh-data`, `merge-and-qa`, or `publish-weekly-report`. That's a better long-term pattern than pasting the same procedural prompt into every thread.

217 

218## Suggested prompts

219 

220**Set Up the Analysis Environment**

221 

222**Load the Dataset and Explain It**

223 

224**Profile the Merge Before You Join**

225 

226**Open a Fresh Exploration Worktree**

227 

228**Build an Interpretable First Model**

229 

230**Package the Results for Stakeholders**