use-cases/clean-messy-data.md +129 −0 added
1# Clean and prepare messy data | Codex use cases
2
3Codex use cases
4
5
6
7
8
9Codex use case
10
11# Clean and prepare messy data
12
13Process tabular data without affecting the original.
14
15Difficulty **Easy**
16
17Time horizon **5m**
18
19Drag in or mention a messy CSV or spreadsheet, describe the problems you see, and ask Codex to write a cleaned copy while keeping the original file unchanged.
20
21## Best for
22
23- CSV or spreadsheet exports with mixed dates, currencies, duplicates, summary rows, or missing values.
24 - Teams who work with data from multiple sources.
25
26# Contents
27
28[← All use cases](https://developers.openai.com/codex/use-cases)
29
30Copy page [Export as PDF](https://developers.openai.com/codex/use-cases/clean-messy-data/?export=pdf)
31
32Drag in or mention a messy CSV or spreadsheet, describe the problems you see, and ask Codex to write a cleaned copy while keeping the original file unchanged.
33
34Easy
35
365m
37
38Related links
39
40[Analyze data with Codex](https://developers.openai.com/codex/use-cases/analyze-data-export) [File inputs](https://developers.openai.com/api/docs/guides/file-inputs) [Agent skills](https://developers.openai.com/codex/skills)
41
42## Best for
43
44- CSV or spreadsheet exports with mixed dates, currencies, duplicates, summary rows, or missing values.
45 - Teams who work with data from multiple sources.
46
47## Skills & Plugins
48
49- [Spreadsheet](https://github.com/openai/skills/tree/main/skills/.curated/spreadsheet)
50
51 Inspect tabular files, clean columns, and produce reviewable outputs.
52
53| Skill | Why use it |
54| --- | --- |
55| [Spreadsheet](https://github.com/openai/skills/tree/main/skills/.curated/spreadsheet) | Inspect tabular files, clean columns, and produce reviewable outputs. |
56
57## Starter prompt
58
59 Clean @marketplace-risk-rollout-export.csv.
60 What's wrong:
61 - dates are mixed between MM/DD/YYYY and YYYY-MM-DD
62 - currency values include $, commas, and blank cells
63 - a few duplicate customer rows came from repeated exports
64 - region and category names use several aliases
65 - there are pasted summary rows mixed into the data
66 What I want:
67 - write a cleaned CSV
68 - keep the original file unchanged
69 - use one date format
70 - keep blank currency cells blank
71 - preserve source row IDs when possible
72- add a short data-quality note with rows you changed, removed, or could not clean confidently
73
74[Open in the Codex app](codex://new?prompt=Clean+%40marketplace-risk-rollout-export.csv.%0A%0AWhat%27s+wrong%3A%0A-+dates+are+mixed+between+MM%2FDD%2FYYYY+and+YYYY-MM-DD%0A-+currency+values+include+%24%2C+commas%2C+and+blank+cells%0A-+a+few+duplicate+customer+rows+came+from+repeated+exports%0A-+region+and+category+names+use+several+aliases%0A-+there+are+pasted+summary+rows+mixed+into+the+data%0A%0AWhat+I+want%3A%0A-+write+a+cleaned+CSV%0A-+keep+the+original+file+unchanged%0A-+use+one+date+format%0A-+keep+blank+currency+cells+blank%0A-+preserve+source+row+IDs+when+possible%0A-+add+a+short+data-quality+note+with+rows+you+changed%2C+removed%2C+or+could+not+clean+confidently "Open in the Codex app")
75
76 Clean @marketplace-risk-rollout-export.csv.
77 What's wrong:
78 - dates are mixed between MM/DD/YYYY and YYYY-MM-DD
79 - currency values include $, commas, and blank cells
80 - a few duplicate customer rows came from repeated exports
81 - region and category names use several aliases
82 - there are pasted summary rows mixed into the data
83 What I want:
84 - write a cleaned CSV
85 - keep the original file unchanged
86 - use one date format
87 - keep blank currency cells blank
88 - preserve source row IDs when possible
89- add a short data-quality note with rows you changed, removed, or could not clean confidently
90
91## Introduction
92
93Codex is great at cleaning systematically tabular data.
94When a CSV or spreadsheet has mixed dates, duplicate rows, currency strings, blank cells, aliases, or pasted summary rows, ask Codex to clean a copy and leave the original file unchanged.
95
96[
97Your browser does not support the video tag.
98](https://cdn.openai.com/codex/docs/developers-website/use-cases/data-analysis-cleaning-csv.mp4)
99
100## How to use
101
1021. Drag the file into Codex or mention it in your prompt, such as `@customer-export.csv`.
1032. Describe the problems you already see.
1043. Tell Codex what the cleaned version should be: CSV, spreadsheet tab, or upload-ready file.
1054. Review the cleaned copy before using it.
106
107Use the starter prompt on this page for the first cleaning pass. Replace the file name and bullets with your own. The useful details are the problems you already see and the file you need next: a cleaned CSV, a clean spreadsheet tab, or an upload-ready file. After Codex writes the clean copy, open the cleaned file and the data-quality note from the thread before using the data downstream.
108
109## Related use cases
110
111[
112
113### Query tabular data
114
115Use Codex with a CSV, spreadsheet, dashboard export, Google Sheet, or local data file to...
116
117Data Knowledge Work](https://developers.openai.com/codex/use-cases/analyze-data-export)[
118
119### Turn feedback into actions
120
121Connect Codex to multiple data sources such as Slack, GitHub, Linear, or Google Drive to...
122
123Data Integrations](https://developers.openai.com/codex/use-cases/feedback-synthesis)[
124
125### Coordinate new-hire onboarding
126
127Use Codex to gather approved new-hire context, stage tracker updates, draft team-by-team...
128
129Integrations Data](https://developers.openai.com/codex/use-cases/new-hire-onboarding)