release radar → benchmark queue
verified Claude/Codex releases now queue benchmark work only after install evidence exists.
OVM tracks Codex, Claude Code, and Pi releases, installs exact versions, watches upstream updates, and turns benchmark runs into publishable Atlas research evidence.
Codex pays a one-time cold reindex on large stores, then launches faster warm than Claude in this run. Claude currently wins the trivial one-token response prompt, with sample size caveats.
curl -fsSL https://ovm.sh/install | sh
brew tap ovm-sh/ovm && brew install ovm
npm install -g @ovm-sh/ovm
OVM is part product, part instrument. Release Radar, benchmark runs, and migration audits should produce publishable evidence, not disappear into terminal scrollback.
Running old and new Codex builds side by side exposed a silent shared state migration failure. OVM turns that kind of boundary into a visible warning and a research artifact.
verified Claude/Codex releases now queue benchmark work only after install evidence exists.
Release Radar detects package updates, verifies OVM can install them, and keeps command evidence.
Benchmarks run where real auth and session state live, then append durable history for reports.
Clean public commits carry curated code, release notes, benchmark data, and research write-ups.