Real inputs
~/.codex/sessions: 1,938 JSONL files, 6.8 GB.
~/.claude/projects: 8,838 JSONL files, 5.7 GB.
OVM can benchmark Codex and Claude against disposable synthetic histories. This note checks whether that synthetic store is close enough to use as research evidence.
The corrected fixture now preserves local JSONL file counts and the heavy-tailed byte distribution. That makes it useful for startup and backfill pressure tests. It still does not synthesize Codex function-call events, function-call outputs, or Claude tool sidecar files, so parser-shape conclusions need a separate fixture upgrade.
Real stores were measured from the local macOS session directories.
Synthetic stores were generated with
ovm-benchmark fixture --match-real --scale 0.1 --count 20
into a scratch home. No private transcript text is copied into the
fixture.
~/.codex/sessions: 1,938 JSONL files, 6.8 GB.
~/.claude/projects: 8,838 JSONL files, 5.7 GB.
Preserve count, total bytes, median size, and p90 size using sorted real file-size targets.
Synthetic JSONL currently contains message events only. Tool-call event shape is not represented yet.
| store | files | bytes | avg file | p50 file | p90 file |
|---|---|---|---|---|---|
| Codex real | 1,938 | 6,813 MB | 3,516 KB | 378 KB | 5,051 KB |
| Codex synthetic 0.1 | 194 | 296 MB | 1,526 KB | 408 KB | 4,849 KB |
| Claude real | 8,838 | 5,716 MB | 647 KB | 102 KB | 751 KB |
| Claude synthetic 0.1 | 884 | 520 MB | 588 KB | 172 KB | 819 KB |
| product | environment | cold | warm samples | interpretation |
|---|---|---|---|---|
| Codex | synthetic 0.1 scratch home | 7,012 ms | 5,332 / 3,078 / 6,506 ms | Captures cold/backfill direction; warm is still noisier than real. |
| Codex | real home, warm | not run | 2,558 / 2,103 / 2,249 ms | Real warm launch is faster than the synthetic scratch run. |
| Claude | synthetic 0.1 scratch home | 1,219 ms | 1,105 / 1,245 / 1,620 ms | Close enough for startup-size pressure checks. |
| Claude | real home, warm | not run | 1,275 / 1,837 / 2,458 ms | Same order of magnitude as the synthetic run. |
| sample | files sampled | function_call | function_call_output | tool_use | tool_result |
|---|---|---|---|---|---|
| Codex real | 200 | 67,653 | 33,814 | 255 | 18 |
| Codex synthetic | 200 | 0 | 0 | 0 | 0 |
| Claude real | 200 | 0 | 0 | 136 | 65 |
The current report should be read as validation for synthetic storage pressure and startup measurement. The next research upgrade is to add structured Codex function-call events and Claude tool-use sidecars to the fixture, then rerun this comparison.