- Published on
- · 16 min read
AI Token Throughput Tracking Tools: The Complete Guide for Developers (2026)
TL;DR: Install CodexBar (brew install --cask codexbar) for always-on menu bar monitoring, then run Agentlytics (npx agentlytics) periodically for historical analysis. That two-tool stack covers real-time awareness and deep analytics across all major AI coding tools. For Claude Code specifically, ccusage (npx ccusage) is the most mature dedicated tracker with 12k GitHub stars.
Andrej Karpathy put it bluntly on a recent episode of No Priors:
"I feel nervous when I have subscription left over. That just means I haven't maximized my token throughput."
He compared it to his PhD days: "You would feel nervous when your GPUs are not running... Now it's not about flops. It's about tokens." The goal, he says, is to "maximize your token throughput and not be in the loop" — remove yourself as the bottleneck so agents work autonomously on your behalf.
The same week, Jensen Huang said on the All-In Podcast that if a $500,000 engineer isn't consuming at least $250,000 worth of tokens, he'll be "deeply alarmed." Tokens, he said, are now "one of the recruiting tools in Silicon Valley."
The message is clear: tokens are the new developer currency. But most of us have no idea where they're going.
I use Claude Code, Cursor, and occasionally Copilot — sometimes all in the same day. My monthly AI spend had crept past $200 and I couldn't tell you which tool was delivering the most value, which projects were burning the most tokens, or whether I was hitting rate limits because of waste or genuine heavy use. One developer documented spending $15,000 in API-equivalent tokens over 8 months — 10 billion tokens — while paying $800 in subscriptions.
An entire ecosystem of tracking tools has emerged in 2025-2026 to address this. I cataloged 30+ tools across seven categories, tested the key ones, and compiled everything into this guide.
Table of Contents
- Why Token Tracking Matters Now
- What Developers Actually Spend
- The Two-Tool Stack (Quick Start)
- Multi-Platform CLI Token Trackers
- macOS Menu Bar Apps
- Claude Code-Specific Tools
- LLM Observability Platforms
- Proxy and Router Tools
- Engineering Team ROI Tools
- Python and JS Libraries
- Comparison Matrix
- Recommended Stacks by Use Case
Why Token Tracking Matters Now
Three things converged in 2025-2026 that made token tracking essential:
1. Multi-tool usage became the norm. Developers now routinely use Claude Code, OpenAI Codex CLI, Google Gemini CLI, Cursor, Antigravity, Windsurf, and GitHub Copilot — often simultaneously. Karpathy himself recommends it: "If you're running out of quota on Codex, you should switch to Claude." Without unified visibility, you can't compare ROI across tools.
2. Subscription costs added up fast. Claude Code Max is $100-200/mo. Cursor Pro is $20/mo (Pro+ is $60, Ultra is $200). Copilot ranges from $10-39/mo. A developer running two or three tools is spending $150-400/mo on AI assistance — $1,800-4,800/year. And the pricing models keep changing: Cursor switched from request-based to credit-based in June 2025. Windsurf moved from credits to daily quotas in March 2026. Nobody has landed on a stable model yet.
3. Rate limits became the bottleneck. Claude Code Max plans have rate limits tied to 5-hour rolling windows (~88K tokens for 5x, ~220K tokens for 20x). Hitting a rate limit mid-session kills your flow. Knowing how close you are — and whether you're spending tokens on high-value work or runaway prompts — is the difference between a productive day and a frustrating one.
Tokens are compute, and compute should be measured. You wouldn't run production servers without monitoring. You shouldn't run AI-assisted development without monitoring either.
What Developers Actually Spend
Before choosing a tracking tool, it helps to understand the landscape of what people are paying.
Current Pricing (March 2026)
| Tool | Free | Base | Mid | Top | Model |
|---|---|---|---|---|---|
| Claude Code | - | $20/mo (Pro) | $100/mo (Max 5x) | $200/mo (Max 20x) | Rolling window |
| Cursor | Limited | $20/mo (Pro) | $60/mo (Pro+) | $200/mo (Ultra) | Credit pool |
| GitHub Copilot | 2K completions | $10/mo (Pro) | $39/mo (Pro+) | $39/user (Enterprise) | Premium requests |
| Windsurf | Yes | $20/mo (Pro) | - | $200/mo (Max) | Daily/weekly quotas |
| Codex CLI | Free (OSS) | $20/mo (Plus) | - | $200/mo (Pro) | Message limits |
Real-World Usage Data
Anthropic reports average Claude Code usage at ~$6/developer/day in API-equivalent spend, with the 90th percentile under $12/day. That's roughly $180/mo average — which is why the Max 5x plan ($100/mo) is a bargain for active users.
But the extremes are wild. Community-reported numbers include:
- 170 million tokens in 2 days (one Cursor user)
- $150 in 48 hours on a mid-size repo (Claude Code)
- $607.70 in 3.5 days beyond a $25/month plan (Replit)
- 28 million tokens to generate 149 lines of code (worst case debugging spiral)
- $4,000 in two weeks (one CTO's report)
The typical range across community forums is $40-120/mo for developers using 1-2 tools. Power users with multiple subscriptions land at $200-500/mo.
The Subscription vs API Breakeven
| Usage Level | Monthly Tokens | API Cost | Best Plan |
|---|---|---|---|
| Light | Under 50M | Under $100 | Pro ($20/mo) |
| Medium | 50-200M | $100-400 | Max 5x ($100/mo) |
| Heavy | 200M-1B | $400-2,000 | Max 20x ($200/mo) |
| Power | 1B+ | $2,000+ | Max 20x (massive savings) |
One power user documented 10 billion tokens over 8 months — $15,000+ in API-equivalent value — for $800 in subscription fees. That's a 93% savings on the Max plan.
The Two-Tool Stack (Quick Start)
Before diving into 30+ tools, here's what I recommend for most developers. Get running in under 5 minutes:
1. Install CodexBar (menu bar — always on)
brew install --cask codexbarCodexBar (9.4k stars) shows your current session spend and weekly limits directly in the macOS menu bar. Supports 15+ providers including Claude Code, Codex CLI, Cursor, Gemini, GitHub Copilot, and OpenRouter. You'll always know if you're approaching a rate limit.
2. Run Agentlytics (historical dashboard — on demand)
npx agentlyticsAgentlytics gives you the big picture: per-project costs, per-editor breakdowns, usage heatmaps, and side-by-side comparisons across 16 editors. Run it at the end of the week to understand where your tokens went.
3. Optional: Install tokentop (TUI — for budget tracking)
go install github.com/tokentopapp/tokentop@latestIf you want a dedicated terminal dashboard with spending limits and budget warnings, tokentop is described as "htop for AI costs." Connects to Anthropic, OpenAI, and Gemini via OAuth.
That's the stack. Real-time awareness + historical analysis + optional budget enforcement. Now let's go deep on every category.
Multi-Platform CLI Token Trackers
These tools aggregate token usage data from multiple AI coding harnesses by reading local session files. They require no API keys and keep all data local.
Agentlytics
The broadest editor coverage available. Supports Cursor, Windsurf, Claude Code, VS Code Copilot, Zed, Antigravity, OpenCode, and Command Code — 16 editors total.
npx agentlytics- Unified dashboard with KPIs, heatmaps, usage streaks, per-project analytics, and side-by-side editor comparisons
- Team Relay mode for sharing usage across organizations
- All data stays completely local — zero configuration needed
GitHub: f/agentlytics (359 stars)
tokscale
Supports Claude Code, OpenCode, OpenClaw, Codex CLI, Gemini CLI, Cursor, AmpCode, Factory Droid, Kimi, and Pi.
npx tokscale- Real-time pricing from LiteLLM, global leaderboard, and 2D/3D contribution graphs
- Platform-specific filters (e.g.,
tokscale --claude) - Kardashev-scale gamification for token consumption
GitHub: junhoyeo/tokscale (1.4k stars) | tokscale.ai
toktrack
Built in Rust with SIMD acceleration. Scans 3,500+ session files in approximately 40ms — the fastest scanner in this category.
cargo install toktrack- Supports Claude Code, Codex CLI, Gemini CLI, and OpenCode
- Daily/weekly/monthly views and model-level breakdowns
- TUI dashboard with 3 tabs and shareable SVG receipts
- Preserves cost history even after CLI tools delete session files
GitHub: mag123c/toktrack (71 stars)
ccusage
The most mature and popular CLI tracker. Supports Claude Code and Codex CLI with daily, monthly, and session-based usage with cost breakdowns.
npx ccusage- Reads local JSONL files, offline mode with cached pricing
- MCP integration and multi-instance/project support
- Live monitoring mode for real-time tracking
GitHub: ryoppippi/ccusage (12k stars) | ccusage.com
tokenusage (tu)
Rust-based CLI/TUI/GUI tool supporting Claude Code, Codex CLI, and Antigravity.
cargo install tokenusage- Claims to be 214x faster than ccusage for Claude and 138x faster for Codex
- Live monitoring mode and image generation for shareable usage cards
tokentop
A real-time terminal TUI described as "htop for AI costs." Connects to Anthropic, OpenAI, and Gemini via OAuth.
go install github.com/tokentopapp/tokentop@latest- Tracks per-request cost, model, tokens, and duration with daily/weekly/monthly budgets
- Spending limits with visual warnings, cache leverage analysis, and ASCII step charts
- 4 different dashboard views
GitHub: tokentopapp/tokentop (40 stars) | tokentop.app
Platform Coverage Comparison
| Tool | Claude Code | Codex CLI | Gemini CLI | Cursor | Antigravity | Language | Stars |
|---|---|---|---|---|---|---|---|
| Agentlytics | Yes | Yes (VS Code) | Yes (Zed) | Yes | Yes | JS (npx) | 359 |
| tokscale | Yes | Yes | Yes | Yes | Yes (AmpCode) | JS | 1.4k |
| toktrack | Yes | Yes | Yes | No | No | Rust | 71 |
| tokentop | Yes (OAuth) | Yes (OAuth) | Yes (OAuth) | No | No | Go | 40 |
| tokenusage | Yes | Yes | No | No | Yes | Rust | 8 |
| ccusage | Yes | Yes | No | No | No | JS (npx) | 12k |
My take: If you use multiple editors, Agentlytics is the easy choice — broadest coverage, zero config. If you only use Claude Code + Codex, ccusage is more mature, more popular (12k stars), and focused. If raw speed matters (scanning thousands of session files), toktrack's Rust+SIMD approach is unmatched.
macOS Menu Bar Apps
Always-visible monitoring apps that sit in the macOS menu bar. These provide glanceable, real-time awareness of token spend and rate limits without requiring you to run a command.
CodexBar
Supports 15+ providers including OpenAI Codex, Claude Code, Cursor, Gemini, GitHub Copilot, and OpenRouter.
brew install --cask codexbar- Dual-meter format showing session and weekly limits, credits, and countdown timers
- Reads local CLI logs and browser cookies — privacy-first with no external data transmission
- Free, open source, and installable via Homebrew
- Includes a bundled CLI for terminal-only users
GitHub: steipete/CodexBar (9.4k stars) | codexbar.app
TokenBar
Supports OpenAI, Claude, Cursor, OpenRouter, Perplexity, Vertex AI, DeepSeek, and Mistral — 20+ providers total.
- Key differentiator: runaway-prompt detection — catches retries and background agent activity before costs spike
- Pace indicators and incident detection alerts when spending snowballs
- All data stays local; $4.99 one-time purchase
Tokemon
Focused on Claude Code with deep analytics. Polls usage every 30 seconds.
- Burn rate per hour, per-project token breakdowns, and team analytics via Admin API
- 24h/7d usage charts and estimates when you'll hit limits
- Organization-wide analytics and Raycast integration; open source (MIT)
SessionWatcher
Lightweight tool for Claude Code and Codex CLI. Tracks token usage, costs, and rate limits.
- Rolling 5-hour window tracking and countdown
- No API keys needed; $1.99 one-time purchase
ClaudeBar
Supports Claude, Codex, Antigravity, and Gemini.
- Quota health with color-coded progress bars, auto-refresh, system notifications
- Keyboard shortcuts; open source
My take: CodexBar wins on breadth (15+ providers, Homebrew install, 9.4k stars, free). TokenBar is the pick if runaway-prompt detection matters — it catches those cases where an agent gets stuck in a retry loop and your costs spike before you notice. Community reports of single Explore() operations consuming 90k tokens and MCP tool descriptions filling context before real work starts make this a real concern. Tokemon is best for Claude Code power users who want per-project granularity and burn rate predictions.
Claude Code-Specific Tools
Purpose-built tools focused specifically on monitoring and analyzing Claude Code usage. These offer the deepest integration with Claude Code's session data and telemetry.
| Tool | Description | Key Feature |
|---|---|---|
| claude-code-otel | Full observability stack (OTEL + Prometheus + Loki + Grafana) | Cost/tokens/sessions/tool usage/DAU/WAU/MAU tracking |
| claude_telemetry | Drop-in 'claudia' command replacement | Exports to Logfire, Sentry, Honeycomb, Datadog |
| claude-code-metrics-stack | Local Grafana dashboard | Cost, tokens, sessions, productivity metrics |
| claude-code-usage-analyzer | Detailed breakdowns by model and token type | Statistical insights (mean/median/P95) |
| cccost | Instruments Claude Code for actual token tracking | Outputs .usage.json for statusline scripts |
| Claude-Code-Usage-Monitor | Real-time monitoring with predictions | Configurable plan support (Pro/Max x5/x20) |
The official Anthropic monitoring guide at github.com/anthropics/claude-code-monitoring-guide provides the canonical approach to setting up OpenTelemetry-based telemetry for Claude Code.
Integration Notes
- claude-code-otel is the most comprehensive — full Grafana stack with pre-built dashboards
- claude_telemetry is best if you already use Datadog/Sentry/Honeycomb and want to send Claude Code data there
- cccost is the lightest option — just outputs JSON for statusline integration
- The official claude-code-monitoring-guide should be your starting point for any enterprise deployment
LLM Observability Platforms
Enterprise-grade platforms that provide comprehensive monitoring, tracing, and cost tracking across multiple LLM providers. Best suited for teams running production AI applications or needing deep analytics beyond simple token counting.
Helicone
Open-source AI gateway (YC W23) supporting 100+ providers with one-line integration.
- Token usage, latency, and cost tracking with open-source cost data for 300+ models
- Caching, rate limiting, custom properties/tags, request monitoring
- Intelligent routing with failover
GitHub: Helicone/helicone (5.4k stars) | helicone.ai
Langfuse
Open-source observability platform (YC W23) with automatic cost calculation for supported models. The most popular open-source option in this category.
- Supports cached, audio, and image tokens
- Cost/latency breakdowns by user, session, feature, model, and prompt version
- OTEL-native SDK, self-hostable
GitHub: langfuse/langfuse (23.9k stars) | langfuse.com
LangSmith
Commercial platform by LangChain. Automatically records token usage and cost.
- Custom dashboards for P50/P99 latency, error rates, cost breakdowns, and feedback scores
- Trace trees and cost alerts/thresholds
Portkey
Open-source gateway supporting 1,600+ LLMs and providers.
- Tracks token usage, retries, and budget adherence as part of gateway observability
- Dynamic model switching, workload distribution, failover
- Real-time observability dashboard built on OpenTelemetry
GitHub: Portkey-AI/gateway | portkey.ai
Braintrust
Provides per-request cost breakdowns with tag-based attribution across prompt, cached, completion, and reasoning tokens.
- Automatic token/cost capture with zero config, budget alerts
- Identifies which 5% of requests consume 50% of tokens
Observability Platform Comparison
| Platform | Open Source | Self-Hostable | Provider Count | Key Differentiator | Stars |
|---|---|---|---|---|---|
| Helicone | Yes | Yes | 100+ | AI Gateway with caching + rate limiting | 5.4k |
| Langfuse | Yes | Yes | Many (OTEL) | Trace trees, prompt versioning | 23.9k |
| LangSmith | No | No | Many | LangChain ecosystem, custom dashboards | - |
| Portkey | Yes (gateway) | Yes | 1,600+ | Dynamic routing + failover | - |
| Braintrust | Partial | No | Major | Tag-based cost attribution | - |
Selection Guidance
- Need full self-hosted control? Helicone or Langfuse
- Already using LangChain? LangSmith integrates natively
- Maximum provider coverage? Portkey supports 1,600+ LLMs
- Cost attribution by feature/team? Braintrust's tag-based approach
- Gateway + observability in one? Helicone or Portkey
Proxy and Router Tools
These tools sit between your coding harness and the AI provider, routing requests while tracking usage. They offer the most granular control over spending and can enforce limits.
LiteLLM
Open-source proxy providing a unified OpenAI-compatible API across 100+ LLM providers.
pip install litellm- Built-in spend tracking per key, team, and user with Prometheus metrics and PostgreSQL logging
- UI dashboard plus Grafana, Datadog, and SigNoz integrations
- Per-request spend and token logging with configurable budget limits
OpenRouter
Commercial service providing access to hundreds of models from multiple providers.
- Activity dashboard with CSV/PDF export, per-key credit limits with auto-reset
- Enterprise usage monitoring — Spend, Tokens, and Requests metrics with filtering by model, API key, and time period
9Router
Open-source smart routing tool supporting Claude Code, Cursor, Antigravity, Copilot, Codex, Gemini, OpenCode, Cline, and OpenClaw.
- 3-tier smart fallback routing
- Quota tracking for Claude Code/Codex/Gemini, spending limits per provider
- Auto token refresh
GitHub: decolua/9router | 9router.com
Bifrost (Maxim AI)
Open-source gateway (Apache 2.0) focused on Claude Code / Anthropic API. Only 11 microsecond overhead at 5K RPS.
- Hierarchical budget management
- Built-in dashboard and integration with the Maxim observability platform
Engineering Team ROI Tools
These tools go beyond token counting to measure the business impact of AI coding tools — connecting token spend to engineering outcomes like deployment frequency, lead time, and code quality.
Jensen Huang's vision of giving every engineer a token budget equal to half their salary means these tools will matter a lot more soon. If you're giving a $200K engineer $100K in annual AI compute, you need to prove ROI.
Faros AI
Enterprise platform connecting AI tool usage from Claude Code, GitHub Copilot, and Cursor to engineering outcomes.
- Tracks token usage, costs by model, and output metrics (commits, PRs)
- Single pane of glass across all AI coding tools
- Correlation to lead time, deployment frequency, and change failure rate (DORA metrics)
Swarmia
Tracks adoption rates and usage patterns for GitHub Copilot and Cursor with direct integrations.
- Correlates AI tool usage with DORA metrics
- Measures license utilization vs. active usage to identify underutilized seats
GitClear
Measures code quality impact rather than raw tokens for GitHub Copilot, Cursor, and Claude Code.
- AI-attributed code tracking and code duplication analysis
- Churn analysis to determine if AI-generated code is actually sticking
ToolSpend
Connects AI service subscriptions with banking data to show actual spend across ChatGPT, Claude, Cursor, Perplexity, ElevenLabs, and more.
- Identifies underutilized seats, duplicate tools
- Pre-renewal alerts with team-level attribution
Python and JS Libraries
For developers building custom token tracking into their own applications, these libraries provide programmatic access to token counting and cost estimation.
| Library | Language | Providers | Key Feature |
|---|---|---|---|
| Tokenator | Python | OpenAI, Anthropic, Google | SDK wrappers with automatic cost tracking |
| TokenX | Python | Multiple | Decorators for cost/latency monitoring |
| llm-token-tracker | JS | OpenAI, Claude | MCP support for token tracking |
| tokencost | Python | 400+ LLMs | Price estimates across all major providers |
Usage Patterns
- Tokenator is best for wrapping existing OpenAI/Anthropic SDK calls — add one line to get cost tracking
- tokencost is best as a standalone price lookup library — covers 400+ models with regularly updated pricing
- TokenX uses Python decorators for minimal code changes
- llm-token-tracker is the only JS option and includes MCP server integration
Comparison Matrix
Best tool by use case:
| Use Case | Top Pick | Runner-Up | Why |
|---|---|---|---|
| Track all coding CLIs | Agentlytics | tokscale | Broadest editor coverage (16), zero config |
| macOS menu bar | CodexBar | TokenBar | 15+ providers, Homebrew, 9.4k stars |
| Enterprise gateway | Helicone | Portkey | Open source, 100+ providers, caching |
| Full observability | Langfuse | LangSmith | Self-hostable, OTEL-native, 23.9k stars |
| Claude Code only | ccusage | toktrack | Most mature, 12k stars, npx ccusage |
| Budget alerts | TokenBar | tokentop | Runaway prompt detection |
| Team ROI | Faros AI | Swarmia | Connects tokens to DORA metrics |
| Smart routing | 9Router | LiteLLM | 3-tier fallback + quota tracking |
| Fastest scanner | toktrack | tokenusage | Rust + SIMD, 40ms for 3500 files |
Recommended Stacks by Use Case
Individual Developer (Free Stack)
A two-tool stack covers both real-time awareness and historical analysis:
- CodexBar — Install once, always visible. Answers "am I about to hit my rate limit right now?" Covers 15+ providers.
- Agentlytics — Run periodically to review per-project, per-editor, per-model cost breakdowns. Answers "where did my tokens go this month?"
- Optional: tokentop — Add if you want a dedicated TUI with budget alerts and cache analysis.
# Get running in 60 seconds
brew install --cask codexbar
npx agentlyticsTeam / Organization
- LiteLLM as a centralized proxy — Unified API, per-team/per-user spend tracking, budget enforcement
- Helicone or Langfuse for observability — Cost/latency dashboards, alerting, self-hostable
- Faros AI for executive reporting — Connect token spend to engineering outcomes (DORA metrics)
Claude Code Power User
# Historical analysis (12k stars, most mature)
npx ccusage
# Fast scanning (Rust + SIMD)
cargo install toktrack
# Menu bar monitoring (9.4k stars, 15+ providers)
brew install --cask codexbarAgentlytics vs CodexBar: Complementary, Not Competing
These tools answer different questions:
| Dimension | Agentlytics | CodexBar |
|---|---|---|
| When to use | Periodic review | Always on |
| What it shows | Historical trends, per-project costs | Current session, live spend |
| How it works | Run a command (npx agentlytics) | Menu bar icon |
| Key question | "Where did my tokens go this month?" | "Am I about to hit my limit?" |
| Install | Global npx (no install needed) | Homebrew / download |
Use both. They're complementary.
All Tool Links
CLI Token Trackers
- Agentlytics — npx agentlytics
- tokscale — tokscale.ai
- toktrack — cargo install
- tokentop — tokentop.app
- tokenusage — cargo install
- ccusage — ccusage.com
Menu Bar Apps
- CodexBar — codexbar.app
- TokenBar
- Tokemon
- SessionWatcher
- ClaudeBar
Observability Platforms
- Helicone — helicone.ai
- Langfuse — langfuse.com
- LangSmith
- Portkey — portkey.ai
- Braintrust
Proxies and Routers
- LiteLLM
- OpenRouter
- 9Router — 9router.com
- Bifrost / Maxim AI
Team ROI
Libraries
- Tokenator — Python
- TokenX — Python
- llm-token-tracker — JS
- tokencost — Python
Official
- claude-code-monitoring-guide — Anthropic's official OTEL guide
