v2.6 · Same prompt, three AIs, real receipts · Read the comparison →

// compare any AI · keep the receipts ↓

LOCAL [WAR] ROOM_

Run any AI on your actual task — see which one solved it cheapest and best, with receipts. One command across Claude, Codex, Gemini, Grok, MiniMax, or any of 20+ supported runtimes. They run the same prompt in one shared session, call real tools (read_file, grep, git_log) to verify claims in your repo, and produce a signed audit trail with cost, tokens, and tool-call receipts. Drive it from a GUI, a CLI, or your coding agent over MCP — same data, same audit trail. Local-first. MIT. Bring your own keys.

N
LLMs per session
20+
Runtimes
Signed
Audit trail
MIT
Open source

You can’t get N LLMs to argue with receipts

🔄

Three browser tabs, no shared context

You paste the same question into Claude, GPT, Gemini one tab at a time. Each starts from zero. None of them see what the others said. The disagreement that should be the signal is buried in your clipboard history.

⛓️

Existing tools just shuffle text

Most multi-LLM debate tools can’t read your repo, can’t grep, can’t verify a single claim before stitching the answers together. They’re vibes-as-a-service — clever, but unverifiable.

👁️

The session disappears

You get an answer, you read it, you move on. No record of which LLM made which claim, no way to cite “confirmed by GPT, disputed by Claude,” no markdown you can paste into a PR. The receipt is the artifact, and it’s missing.

Real sessions, real receipts

Every multi-LLM dispatch lands in your local SQLite as a session you can scroll through later. Each row carries an auto-generated summary, the runtimes that spoke, the personas (when you used --agent), tags, and a session id you can pass to ato sessions get from your terminal. No accounts, no cloud round-trip — all on the developer’s machine.

ATO desktop Sessions tab showing two closed code-review war-rooms: 'Code review: stray dogfood retry comment' (Minimax + Google reviewers, 4 turns) and 'Code Review of Usage Poller Providers' (Minimax + Google reviewers, 4 turns). Each row shows runtime badges, an auto-generated summary describing the disagreement and resolution, topic tags (code-review, dogfood-comment, usage-poller, consensus), and a session UUID.
Sessions tab · two closed multi-LLM code reviews with auto-generated summaries, runtime badges, tags, and session ids

The war-room engine, plus everything you’d want around it

New in v2.6 — Compare any AI on your actual task

ato review --reviewer @security-specialist --reviewer @perf-reviewer --reviewer claude --reviewer minimax • Function-calling tools (read_file, grep, git_log) • Persistent specialist agents with system prompts • Per-turn audit trail in the GUI — verified-via-N-tool-calls vs prompt-only badges • Lean mode forces the LLMs to walk the live repo

  • Live runs panel — See every in-flight dispatch with agent slug, runtime, workspace, and elapsed time. Kill stuck dispatches with one click — no more reading every terminal buffer to find the runaway. Shows up the moment you fire something via Quick Test, the chat pane, scheduled cron, or MCP run_agent.
  • File attribution per dispatch — Every run captures the list of files touched in the project root via mtime-snapshot diff. Works across every runtime since it’s filesystem-level, not stream-parsing. Click any file in the dashboard to see every dispatch that ever touched it — agent, runtime, timestamp, prompt summary, sibling files.
  • Cross-runtime regression detection — Switch @reviewer from Sonnet 4.6 to Opus 4.7 and the dashboard flags “success rate dropped 17pp across 412 conversations.” Joins the configuration-change ledger with trace windows automatically. Severity-tagged: regressions first, improvements second, neutral hidden by default.
  • Honest concurrent attribution — When two agents dispatch into the same workspace, the OS gives us mtimes, not PIDs. Instead of pretending we can disambiguate, ATO tags the run as “ambiguous × N” with peer agent slugs. Truth over false confidence.
  • External agents — Build customer-facing chatbots in the same IDE you use for daily ops. Bundle generators for Cloudflare Worker, Vercel Edge, Docker, and standalone Node. 9 chat-LLM providers. Embed widget bundled with every deploy. Customer’s API key, customer’s infra — ATO never holds inference compute.
  • Dynamic prompts that adapt at fire time — Reference {user_name}, {project_root}, {recent_orders} in your system prompt. Resolvers: static, env, project path, file, database query, MCP call, computed JS.
  • Sequential automation pipelines — One prompt fires the whole workflow. Each child runs on its own runtime, so Claude → Codex → Gemini chains work natively. Routed groups + visual graph editor for specialist routing.
  • 15+ providers, 6 native runtimes — Claude Code, Codex, Gemini CLI, OpenClaw, Hermes, Ollama + Anthropic, OpenAI, Google AI, Mistral, Groq, xAI, Together, Fireworks, DeepSeek, Qwen, MiniMax, Kimi, GLM, Yi via API key.
Insights · Live runs · 3 in flight
@code-writer CLAUDE 14s
📁 ato/repo-a
@security-reviewer CODEX 8s
📁 ato/repo-a · ⚠ ambiguous ×1
@docs-summarizer GEMINI 2m 04s
📁 ato/docs-site
3 dispatches across 3 runtimes · 2 sharing repo-a · click any file in trace history for cross-run lineage

Cross-runtime A/B — replay any prompt

Pick any past trace. Click Replay. Re-run the original prompt against a different runtime. See source vs replay side-by-side with duration + estimated cost delta. Would Codex have answered correctly on those failing prompts? Now you can find out.

  • Replay — from any cloud trace, pick a target runtime and model. Re-dispatch happens via prompt_agent_inner so the replay is itself killable + appears in Live runs. Status pill ticks pending → running → done; result panel renders both responses + duration delta. Source prompts come from your local execution log — ATO never sends prompt content to a server you don’t already use.
  • Compare workbench — Insights → Compare. Diff any two cloud traces of the same agent: duration, cost (estimated, with an “est.” badge so the precision is honest), files only-in-baseline / only-in-comparison, ok-status change. Kind-agnostic — works for chat dispatches, deployed bundles, group stages, anything.
  • Cost recommendations@code-writer · claude → codex · −59% per call · projected $1.01/mo at this volume. Surfaces concrete swaps when you have multi-runtime history on the same agent and the alternative is meaningfully cheaper at preserved quality. Quality guards: ≥30% cheaper, ok-rate within 10pp, eval-score within 5pp. Renders nothing if no rec qualifies — better than fake confidence.
  • Pipelines sub-tab — multi-stage dispatches (sequential groups, routed groups, anything that fans out across runtimes) grouped by parent_run_id. One row per pipeline; click into the per-stage flow with handoff arrows + per-stage timing + files touched per stage.
  • Workspace-wide ⌘K — one keystroke search agents, groups, schedules, secrets, MCPs, projects, plus your chat history (matches against thread titles AND message bodies, with snippets). Quick Actions list jumps to any Insights sub-tab in one keystroke.
Insights · Compare · Replay claude → codex
Source · CLAUDE
Replay · CODEX
Binary search finds a target by repeatedly halving a sorted range. O(log n) instead of O(n).
Repeatedly compare with the middle, discard the half that can’t contain the target. Halving the search space cuts complexity to O(log n).
Duration
−1842ms
Cost (est.)
−$0.0084
Runtime
claude → codex
Source prompts read from local execution log · never leave the device unless you click replay

Multi-Runtime Context

Per-runtime context breakdown. Switch between Claude, Codex, OpenClaw, and Hermes to see what each agent has loaded. Skills shown as on-demand — not counted in the total.

  • Runtime tabs: Claude / Codex / OpenClaw / Hermes
  • "Not connected" state for uninstalled runtimes
  • Color warnings at 75% and 90% usage
Context Usage 67,234 / 200,000 tokens · 33.6%
System (30K) Skills (12K) MCP (8K) CLAUDE.md (5.2K) Conversation (12K) Free (132.8K)

Skills Manager + Marketplace

Manage skills across all runtimes with per-runtime tabs. Browse the marketplace, install community skills, or ask AI to create one for you.

  • Per-runtime tabs: Claude / Codex / OpenClaw / Hermes
  • AI skill creation: describe what you want, AI writes it
  • In-app approval dialog for file saves
code-review.md
2,340 tokens
testing-patterns.md
1,876 tokens
api-conventions.md
3,102 tokens
⚠ legacy-rules.md
conflict

Automation Builder

Visual workflow editor that auto-detects flows from your installed skills. Any skill with Step or Phase headers becomes a visual automation.

  • Auto-generates flows from skill content
  • Per-node runtime selection (mix agents)
  • Run workflows with one click
Today
45,230
$0.68 estimated
Burn Rate
12.4K/hr
~6.2h to limit
This Week
312K
$4.68 total
This Month
1.2M
$18.40 total

Scheduled jobs

Pick an agent (or a routed/sequential group) and a schedule. The agent’s system prompt, variables, hooks, memory, and skills all fire on every run — not just a raw prompt.

  • Agent / Group / Raw dispatch — agent-based by default
  • Friendly schedule presets (every weekday 9am, hourly, every 15 min…) or full cron expression
  • Wake-from-sleep on every desktop OS — launchd on macOS, systemd --user timers on Linux, Task Scheduler on Windows. Jobs fire even when ATO is closed.
  • Calendar view: click a day to see output or error; smart silent-failure detection
filesystem
stdio 12 tools 23ms
github
stdio 8 tools 45ms
postgres
stdio 5 tools 120ms
slack
sse timeout

Production-ready for teams and companies

LLM API Key Management

Centralized dashboard to store, rotate, and scope API keys for every major LLM provider. Keys are encrypted locally — never sent to any server.

  • Anthropic, OpenAI, Google, Mistral, Groq, Cohere, Together, Fireworks
  • Plus the Chinese providers: DeepSeek, Qwen, MiniMax, Kimi, GLM, Yi (OpenAI-compatible base URLs surfaced in-app)
  • One-click key rotation with masked preview, per-runtime scoping
  • Usage tracking: see which keys are active and how often
A Anthropic Production
sk-a...4f2x
O OpenAI GPT-4
sk-p...9k3m
G Google AI Staging
AI...7xq2
G Groq Fast
gsk...r4p1

Real-time Agent Monitor

Live dashboard showing active agent sessions, token consumption rates, runtime health, and smart alerts — across all your AI coding tools at once.

  • Live session tracking with 3-second refresh (Pro)
  • Token usage timeline charts and burn rate
  • Smart alerts: error spikes, high token usage, offline runtimes
  • Basic stats and recent sessions free for everyone
Tokens/hr
24.5K
Sessions
18
Avg Duration
4.2s
Errors
0
claude code-review session 2.1K tok · 3.4s
codex test generation 1.8K tok · 2.1s
hermes documentation update 956 tok · 1.8s

Audit Log

Complete audit trail of every action across your agentic systems. Filter by action type, resource, and time range. Export to JSON for compliance.

  • Track skill changes, key rotations, config updates, cron triggers
  • Filterable by action type and resource
  • Stats dashboard: today, this week, top actions
  • One-click JSON export
skill.create — code-review.md 2m ago
config.update — claude runtime 5m ago
cron.trigger — daily-backup 1h ago
secret.delete — old-api-key 3h ago

SSO & Enterprise Auth FREE WITH SIGN-UP

Connect your company's identity provider. Google Workspace, Okta, Microsoft Entra, or any OIDC provider — with domain restriction and auto-provisioning.

  • Google Workspace, Okta, Microsoft Entra built-in
  • Any custom OIDC provider via URL config
  • Domain restriction: only @company.com can join
  • Auto-provision users on first SSO login
SSO Providers
G
Google
Active
M
Microsoft
Configure
O
Okta
Configure

Cross-runtime, by protocol — 17 tools

Every ATO agent is exposed as an MCP tool. Any MCP-aware runtime — Claude Code, Codex, Cursor, others — can dispatch to any ATO agent regardless of which runtime owns it.

$ npx ato-mcp # Add to ~/.claude/settings.json: { "mcpServers": { "ato": { "command": "npx", "args": ["ato-mcp"] } } } # Agent dispatch (cross-runtime) list_agents — All ATO agents + groups run_agent — Dispatch to any agent or group, transparently # Context & Usage get_context_usage — Context window breakdown get_usage_stats — Token and cost analytics get_mcp_status — MCP server health # Skills Management list_skills — All skills with token counts toggle_skill — Enable/disable skills get_skill_index_stats — Index & watcher status rescan_skills — Force full rescan # Runtime Health get_runtime_status — Check any runtime get_all_runtime_statuses — All runtimes at once get_agent_logs — Execution logs / traces get_runtime_path_cache — Cached CLI paths refresh_runtime_paths — Re-discover CLIs set_runtime_path — Manual CLI path # Cache Management get_cache_stats — Cache statistics clear_cache — Flush cache

Built for developers

Desktop

offline-first · free
  • Tauri 2.x (Rust + React)
  • SQLite local database
  • LLM API key management
  • Audit logging
  • Agent monitor (basic)
  • Skills, automation, cron, MCP
Sync

Cloud (free with sign-up)

free with sign-up · early access
  • 7 microservices on Railway
  • PostgreSQL + SSO (OIDC)
  • Real-time agent monitoring
  • Smart alerts & token charts
  • Cloud trace retention + observability
  • Cloud sync of agents across devices

Available in English, Português, and Español

English EN
Português PT
Español ES

Download ATO

Free, open source, and ready for your platform.

> Early access: every feature free with a cloud sign-up — replay, compare, regression detection, cost recommendations, cloud sync, trace retention, evaluators. No payment, no credit card — just an email.

# Install via Homebrew (macOS) $ brew tap WillNigri/ato $ brew install --cask ato # SDK — only for ATO-authored agents deployed externally $ npm install @ato-sdk/js # Or install just the MCP server $ npx ato-mcp

Complementary, not competing. ATO is your local war room for humans and LLMs — the developer side of multi-runtime AI work. For SDK-based production observability across your deployed app stack, use Langfuse, Helicone, or LangSmith. Most production teams run one from each camp — they cover different sides of the same agent. More on how they fit together →