Your coding agent just fired forty model calls to answer one question, and the bill is hiding somewhere you are not looking. Claude Code buries it in JSONL transcripts that undercount; Codex writes cumulative counters; Cursor stuffs it into a SQLite blob; Aider prints it once and forgets it. The open-source tracker worth installing is the one that matches the trail your agent already leaves — so picking between ccusage, codex-usage-tracker, CodeBurn, and a LiteLLM proxy is really a choice about telemetry shape, not feature checklists. Get it right and you also unlock the levers that cut the bill: prompt-cache hits, model routing, and ruthless context resets.
At a glance
Four open-source tools, four different answers to one question — where did your agent's tokens go. The table sets the basics; the bar chart and matrix below it show how unevenly these tools are adopted and which agent each one was actually built for.
| Tracker | License | Primary agent | Trail it reads |
|---|---|---|---|
| ccusage | MIT | Claude Code (and Codex) | On-disk JSONL transcripts |
| codex-usage-tracker | MIT | Codex CLI | Codex JSONL → SQLite index + MCP |
| CodeBurn | MIT | Cursor (and 24 more) | Per-agent local stores (e.g. state.vscdb) |
| LiteLLM proxy | MIT* | Aider (and anything) | Live API requests (base-URL override) |
Snapshot: 2026-06-09. Star counts and tool lists move fast — verify against each repo. *LiteLLM is MIT except its enterprise/ directory, which carries a separate commercial license.
The thesis: four agents, four trails
Every coding agent records what it spends, but no two record it the same way — and that single fact decides which tool can read it. Claude Code streams a JSONL transcript to ~/.claude/projects/; a parser can read it in milliseconds, but the file is lossy (more on that below). Codex writes token_count events to ~/.codex/sessions/ that carry a running cumulative total plus a per-turn delta, so a tracker just has to follow the counter. Cursor hides everything in a local SQLite database (state.vscdb, the cursorDiskKV table, bubbleId: keys) — undocumented, reverse-engineered, and prone to shifting between versions. And Aider leaves no structured ledger at all: it prints cost to the terminal and writes a prose Markdown history, which is unparseable as a usage feed.
So the four tools in this post are not "the best four token trackers." They are four tools chosen because each one demonstrates a different telemetry shape: a JSONL parser (ccusage), a per-agent SQLite indexer with an agent-queryable surface (codex-usage-tracker), a fan-out of per-agent disk readers (CodeBurn), and a proxy that gives up on disk entirely and meters live traffic (LiteLLM). Match the shape to your agent and the rest is detail.
ccusage — the JSONL parser standard
What it reads and how you run it
ccusage (ryoppippi/ccusage, ~15.8k stars, MIT) is the de-facto standard for local-first agent cost reporting, and it is a pure on-disk parser: it reads the JSONL logs your agent already writes and never touches the network. The canonical install is a single command — npx ccusage@latest or the recommended bunx ccusage — with pnpm dlx and a Nix flake as alternatives. Its primary home is Claude Code's ~/.claude/projects/ transcripts, but a focused ccusage codex command reads Codex too, and the project advertises support across roughly fifteen agent CLIs, so for most people it is the only tracker they need to install.
The reports it gives you
The output is the reason it won: daily, weekly, monthly, session, and blocks (Claude Code's 5-hour billing windows) views, each splitting input, output, cache-creation, and cache-read tokens separately, with a --breakdown flag for per-model cost. A beta statusline mode keeps a live total in front of you. For the everyday question — "what did this week cost, and on which model" — it answers in one command without a config file.
The catch: it inherits a broken source
ccusage is exactly as accurate as the file it reads, and for Claude Code that file lies. Issue #866 documents it precisely: Claude Code's JSONL records input_tokens from a non-final streaming event, so roughly three-quarters of entries show 0 or 1 — an input undercount of 100–174× — and it omits thinking tokens from output_tokens, an output undercount of 10–17× on Opus. On one heavy day the reporter saw ccusage show ~$50 against a real API cost near ~$446. The cache numbers are fine; the rest is not. Crucially this is an upstream Claude Code data-quality bug, not a ccusage defect — the independent cccost project reaches the same verdict and works around it by intercepting fetch() for ground truth. Treat ccusage's Claude Code numbers as a floor, and read Reasoning Models for why thinking tokens are the part most likely to be missing.
codex-usage-tracker — the per-agent persistence layer
What it adds over a plain parser
codex-usage-tracker (douglasmonsky/codex-usage-tracker, MIT, small but actively maintained) is purpose-built for one agent and does two things ccusage does not. First, it parses the token_count events from Codex's session JSONL under ~/.codex/sessions/ into a persistent SQLite index at ~/.codex-usage-tracker/usage.sqlite3 — a real datastore you can query, not a one-shot report. (Codex makes this easy: each event carries both a cumulative running total and a pre-computed per-turn delta, so the tracker follows the counter rather than guessing.) It surfaces metrics a raw parser skips, including context use and cache ratio. Install is one line: pipx install codex-usage-tracking.
The MCP twist — the agent reads its own bill
The second difference is the interesting one: codex-usage-tracker ships an MCP server, exposing tools like usage_summary, usage_query, session_usage, and usage_recommendations. That means the spend data is reachable from inside the agent — Codex can call a tool to ask "how much have I spent today, and where" mid-session. ccusage already covers Codex for plain reporting; reach for codex-usage-tracker when you want a durable per-agent datastore and want the numbers queryable from the agent itself rather than only from your shell.
CodeBurn — the cross-agent disk reader
One reader per agent, twenty-five of them
CodeBurn (getagentseal/codeburn, ~7.8k stars, MIT) is the broadest local-disk tracker, covering 25 coding tools with no proxy and no API keys — it reads each agent's session data directly from disk and prices every call with LiteLLM's rate tables. Its README is itself a tour of how differently agents store data: Cursor in state.vscdb, Gemini CLI in session JSON, Warp in warp.sqlite, Forge in ~/.forge/.forge.db, Copilot in VS Code workspaceStorage transcripts — plus Claude Code, Codex, Cline, Goose and more. Install is npm install -g codeburn and you get a terminal dashboard across whatever you have running.
Where it fits — and where it doesn't
CodeBurn is the cleanest answer for Cursor specifically, now that the popular cursor-stats extension (Dwtexe/cursor-stats, GPL-3.0) was archived in March 2026 and stopped receiving updates. Against ccusage the trade is reach versus depth: CodeBurn spans far more agents, while ccusage is older and battle-tested on Claude Code and Codex with richer per-model reporting. The one place CodeBurn cannot help is Aider — Aider keeps no structured store for a reader to parse, which is exactly the gap the next tool fills. And because it reads each agent's local store, CodeBurn inherits each agent's accuracy, the same way ccusage inherits Claude Code's.
LiteLLM proxy — meter the wire, not the disk
How it works
A LiteLLM proxy (BerriAI/litellm, ~49.8k stars, MIT core) gives up on disk parsing entirely. You run it as a self-hosted gateway and point your client at it with a base-URL override — ANTHROPIC_BASE_URL for Claude clients, OPENAI_BASE_URL for OpenAI ones (older clients use OPENAI_API_BASE). Every request now flows through the proxy, which logs token usage and computed cost per request to its LiteLLM_SpendLogs table, enforces per-key and per-team budgets, and can export to Langfuse or any OpenTelemetry/OpenLLMetry backend for dashboards.
Why it's the right answer for Aider
Because it sits in the request path, the proxy reads the provider's actual usage object rather than re-deriving counts from a transcript — so it is the only approach here that sees ground truth, cache and reasoning tokens included. That makes it the natural fit for Aider, which leaves no structured trail to parse but already imports LiteLLM for its model calls and cost printing; pointing it at a proxy adds the persistent dashboard and per-team accounting it otherwise lacks, with no code change. The honest cost is operational: you now run a service in the hot path — an extra hop, extra latency, and a process to keep up. If you only want a quick local read for one agent, that is too much machinery; if you want one ground-truth ledger across every agent and team, it is the only option that delivers it.
Cross-cutting comparison
Accuracy — who actually sees the real number
This is where the four diverge most. Disk readers — ccusage and CodeBurn — are only as accurate as whatever the agent persisted, which is why ccusage's Claude Code totals can be off by an order of magnitude (issue #866) while its cache numbers stay correct. codex-usage-tracker sits a little better because Codex writes honest cumulative counters, so its counts are solid even if the dollar figure depends on the price table you give it. The LiteLLM proxy is the only one that sees the provider's real usage object — cache and reasoning tokens included — because it reads the response on the wire rather than a log written after the fact. If "the number must be exactly right" is the requirement, that asymmetry decides it.
Coverage — which agents each tracker can read
Reach and focus pull against each other. codex-usage-tracker is the narrowest by design — Codex only — and trades breadth for a persistent index and an MCP surface. ccusage is wide for the two agents that matter most to it (Claude Code and Codex first-class, ~15 CLIs total) but ignores Cursor and Aider. CodeBurn is the breadth champion at 25 agents, the obvious pick if you switch between Cursor, Gemini CLI, Warp, Copilot and friends — though it, too, has no reader for Aider. The LiteLLM proxy covers anything you can re-point at it, which is why it catches the agents the disk readers miss; the exception is Cursor, which only reaches an external proxy in bring-your-own-key mode.
Setup cost — from one-liner to running a service
Friction tracks accuracy almost perfectly — you pay for ground truth with setup. ccusage and CodeBurn are the cheapest: npx/bunx and npm i -g respectively, read-only, run on demand, nothing left running. codex-usage-tracker is barely more — a pipx install that builds a local SQLite index in the background. The LiteLLM proxy is a different category: a server you stand up, secure, keep online, and point every client at via environment variables. The three disk tools you can try in the next sixty seconds; the proxy is a small infrastructure decision.
When to pick which
| Your situation | Pick ccusage if… | Pick codex-usage-tracker if… | Pick CodeBurn if… | Pick a LiteLLM proxy if… |
|---|---|---|---|---|
| I mostly use Claude Code | Yes — one command reads your transcripts (just know the totals undercount). | No — it is Codex-only. | Works (Claude Code is one of its 25), but ccusage is deeper here. | Only if you need exact, ground-truth numbers and will run a service. |
| I use Codex and want the spend in-agent | ccusage codex gives quick reports. |
Yes — SQLite index plus MCP tools the agent can call itself. | Covers Codex, but no MCP surface or persistent index. | Overkill unless you also want budgets across teams. |
| I bounce between Cursor, Gemini, Copilot, Warp… | Misses most of these. | No. | Yes — this is the whole pitch: 25 agents, one dashboard. | Only the ones you can re-point; Cursor needs BYOK. |
| I use Aider | No — Aider leaves no JSONL to read. | No. | No reader for Aider. | Yes — Aider already uses LiteLLM; a proxy adds the dashboard it lacks. |
| I need one ground-truth ledger across agents + teams | Local-only, per-agent. | Local-only, single agent. | Local-only, no team rollups. | Yes — spend logs, per-key/per-team budgets, exact counts. |
How to actually save tokens
A tracker tells you the bill; these are the levers that lower it. The three that matter most are universal — keep your prompt prefix stable so cache reads stay cheap, route cheap work to a cheap model, and reset context aggressively — but each agent exposes them differently. See Cost, Quality, Latency for why these three are the whole trade, and the Claude Code vs Codex vs Cursor vs Aider comparison for how each agent behaves.
Claude Code
Use /clear between unrelated tasks and /compact when a session balloons — every token in the context is re-sent on the next turn, so a bloated context is a recurring charge, not a one-off. Keep your system prompt and CLAUDE.md stable: cache reads bill at a fraction of fresh input (roughly a 10× discount), and changing the prefix throws the cache away. Push cheap subtasks to a cheaper model rather than running everything on Opus.
Codex CLI
Pick a cheaper reasoning model or a lower reasoning effort in ~/.codex/config.toml for routine work. Watch the cache ratio that codex-usage-tracker surfaces: a low ratio means you are churning the prompt prefix and paying full freight every turn. Use /clear to keep the context — and therefore the cumulative token_count — from creeping upward.
Cursor
Cursor moved off per-request pricing in mid-2025; it now bills against a dollar-denominated usage pool tied to real API token costs. So the lever is twofold: which model handles a request (drop the dropdown to the cheapest one that passes) and how much context each request carries — turn off MAX / long-context mode for small edits, since a bigger window means more input tokens billed against your pool.
Aider
Turn on --cache-prompts for Anthropic models so the system prompt, repo map, and read-only files are cached. Run an architect/editor split — a strong --architect model to plan, a cheap --editor-model to apply the diffs — so the expensive model only does the thinking. Use /tokens to inspect the current context before a big send, and /clear to drop it when you change tasks.
FAQ
Why does ccusage undercount Claude Code spend?
Because Claude Code's JSONL transcript is a lossy source, and ccusage faithfully reports what is in it. Per ccusage issue #866, the transcript records input tokens from a non-final streaming event (so most entries read 0 or 1) and omits thinking tokens from the output count — undercounting input by 100–174× and output by 10–17× on Opus, while cache figures stay accurate. It is an upstream Claude Code bug, not a ccusage flaw. For ground truth you need something that sees the actual API response: a fetch-interceptor like cccost, or a LiteLLM proxy.
Is CodeBurn just a rebranded ccusage?
No. Both read on-disk data, but they aim at different problems. ccusage is a focused JSONL parser, older and battle-tested on Claude Code and Codex, with rich per-model and per-window reporting across about fifteen CLIs. CodeBurn is a breadth play: a separate reader for each of 25 agents' local stores — Cursor's SQLite, Gemini's JSON, Warp's database and so on — priced through LiteLLM and shown in one dashboard. Use ccusage for depth on Claude Code/Codex; use CodeBurn when you switch between many agents, especially Cursor.
Does any tracker actually work for Aider?
Not as a disk reader — Aider keeps no structured usage ledger. It prints per-turn and session token/cost to the terminal, has a /tokens command for the current context, and writes a human-readable .aider.chat.history.md, but none of that is a parseable feed. If you want a persistent dashboard for Aider, route it through a LiteLLM proxy — which it already uses internally — and read the spend from the proxy's logs.
Do I need a tracker if my agent already shows token counts?
The in-session counter answers "this turn"; a tracker answers "this week, by model, across every session." It aggregates history, breaks cost down per model and per window, and — through a proxy — gives you numbers you can trust for billing. And for Claude Code specifically, the in-app and JSONL figures can be wrong (issue #866), so a ground-truth source is the only way to know the real number.
Is the LiteLLM-proxy approach worth the extra moving part?
It depends on what you are optimizing. If you want exact, ground-truth spend and one dashboard across multiple agents and a team — and you can run and secure a service — yes, the proxy is the only option here that delivers it. If you just want a quick local read of one agent's recent cost, a disk reader (ccusage, codex-usage-tracker, or CodeBurn) is far lower friction and good enough. Don't run a proxy to answer a question a one-liner already answers.
Further reading
On this wiki:
- Claude Code vs Codex CLI vs Cursor Agent vs Aider — the agents these trackers watch, and how each one behaves with tokens.
- LangSmith vs Braintrust vs Helicone vs Arize Phoenix — the app-layer analogue: the same proxy-vs-SDK split for production LLM observability and cost.
- AFK Coding — when you run several agents in parallel, the bill compounds, and a tracker is how you keep it honest.
- Cost, Quality, Latency — the three-way trade every token-saving lever is really negotiating.
- Tokens & Tokenization — what these tools are counting in the first place.
- Choosing a Model — the model-routing lever, where the biggest savings usually hide.
Project sources:
- ccusage — the JSONL-parsing CLI (see issue #866 on Claude Code's undercount).
- codex-usage-tracker — Codex JSONL → SQLite index + MCP server.
- CodeBurn — per-agent disk readers across 25 coding tools.
- LiteLLM — the proxy/gateway whose spend logs and budgets meter live traffic.