Published on
· 16 min read

AI Token Throughput Tracking Tools: The Complete Guide for Developers (2026)

TL;DR: Install CodexBar (brew install --cask codexbar) for always-on menu bar monitoring, then run Agentlytics (npx agentlytics) periodically for historical analysis. That two-tool stack covers real-time awareness and deep analytics across all major AI coding tools. For Claude Code specifically, ccusage (npx ccusage) is the most mature dedicated tracker with 12k GitHub stars.

Andrej Karpathy put it bluntly on a recent episode of No Priors:

"I feel nervous when I have subscription left over. That just means I haven't maximized my token throughput."

He compared it to his PhD days: "You would feel nervous when your GPUs are not running... Now it's not about flops. It's about tokens." The goal, he says, is to "maximize your token throughput and not be in the loop" — remove yourself as the bottleneck so agents work autonomously on your behalf.

The same week, Jensen Huang said on the All-In Podcast that if a $500,000 engineer isn't consuming at least $250,000 worth of tokens, he'll be "deeply alarmed." Tokens, he said, are now "one of the recruiting tools in Silicon Valley."

The message is clear: tokens are the new developer currency. But most of us have no idea where they're going.

I use Claude Code, Cursor, and occasionally Copilot — sometimes all in the same day. My monthly AI spend had crept past $200 and I couldn't tell you which tool was delivering the most value, which projects were burning the most tokens, or whether I was hitting rate limits because of waste or genuine heavy use. One developer documented spending $15,000 in API-equivalent tokens over 8 months — 10 billion tokens — while paying $800 in subscriptions.

An entire ecosystem of tracking tools has emerged in 2025-2026 to address this. I cataloged 30+ tools across seven categories, tested the key ones, and compiled everything into this guide.

Table of Contents

Why Token Tracking Matters Now

Three things converged in 2025-2026 that made token tracking essential:

1. Multi-tool usage became the norm. Developers now routinely use Claude Code, OpenAI Codex CLI, Google Gemini CLI, Cursor, Antigravity, Windsurf, and GitHub Copilot — often simultaneously. Karpathy himself recommends it: "If you're running out of quota on Codex, you should switch to Claude." Without unified visibility, you can't compare ROI across tools.

2. Subscription costs added up fast. Claude Code Max is $100-200/mo. Cursor Pro is $20/mo (Pro+ is $60, Ultra is $200). Copilot ranges from $10-39/mo. A developer running two or three tools is spending $150-400/mo on AI assistance — $1,800-4,800/year. And the pricing models keep changing: Cursor switched from request-based to credit-based in June 2025. Windsurf moved from credits to daily quotas in March 2026. Nobody has landed on a stable model yet.

3. Rate limits became the bottleneck. Claude Code Max plans have rate limits tied to 5-hour rolling windows (~88K tokens for 5x, ~220K tokens for 20x). Hitting a rate limit mid-session kills your flow. Knowing how close you are — and whether you're spending tokens on high-value work or runaway prompts — is the difference between a productive day and a frustrating one.

Tokens are compute, and compute should be measured. You wouldn't run production servers without monitoring. You shouldn't run AI-assisted development without monitoring either.

What Developers Actually Spend

Before choosing a tracking tool, it helps to understand the landscape of what people are paying.

Current Pricing (March 2026)

ToolFreeBaseMidTopModel
Claude Code-$20/mo (Pro)$100/mo (Max 5x)$200/mo (Max 20x)Rolling window
CursorLimited$20/mo (Pro)$60/mo (Pro+)$200/mo (Ultra)Credit pool
GitHub Copilot2K completions$10/mo (Pro)$39/mo (Pro+)$39/user (Enterprise)Premium requests
WindsurfYes$20/mo (Pro)-$200/mo (Max)Daily/weekly quotas
Codex CLIFree (OSS)$20/mo (Plus)-$200/mo (Pro)Message limits

Real-World Usage Data

Anthropic reports average Claude Code usage at ~$6/developer/day in API-equivalent spend, with the 90th percentile under $12/day. That's roughly $180/mo average — which is why the Max 5x plan ($100/mo) is a bargain for active users.

But the extremes are wild. Community-reported numbers include:

  • 170 million tokens in 2 days (one Cursor user)
  • $150 in 48 hours on a mid-size repo (Claude Code)
  • $607.70 in 3.5 days beyond a $25/month plan (Replit)
  • 28 million tokens to generate 149 lines of code (worst case debugging spiral)
  • $4,000 in two weeks (one CTO's report)

The typical range across community forums is $40-120/mo for developers using 1-2 tools. Power users with multiple subscriptions land at $200-500/mo.

The Subscription vs API Breakeven

Usage LevelMonthly TokensAPI CostBest Plan
LightUnder 50MUnder $100Pro ($20/mo)
Medium50-200M$100-400Max 5x ($100/mo)
Heavy200M-1B$400-2,000Max 20x ($200/mo)
Power1B+$2,000+Max 20x (massive savings)

One power user documented 10 billion tokens over 8 months — $15,000+ in API-equivalent value — for $800 in subscription fees. That's a 93% savings on the Max plan.

The Two-Tool Stack (Quick Start)

Before diving into 30+ tools, here's what I recommend for most developers. Get running in under 5 minutes:

1. Install CodexBar (menu bar — always on)

brew install --cask codexbar

CodexBar (9.4k stars) shows your current session spend and weekly limits directly in the macOS menu bar. Supports 15+ providers including Claude Code, Codex CLI, Cursor, Gemini, GitHub Copilot, and OpenRouter. You'll always know if you're approaching a rate limit.

2. Run Agentlytics (historical dashboard — on demand)

npx agentlytics

Agentlytics gives you the big picture: per-project costs, per-editor breakdowns, usage heatmaps, and side-by-side comparisons across 16 editors. Run it at the end of the week to understand where your tokens went.

3. Optional: Install tokentop (TUI — for budget tracking)

go install github.com/tokentopapp/tokentop@latest

If you want a dedicated terminal dashboard with spending limits and budget warnings, tokentop is described as "htop for AI costs." Connects to Anthropic, OpenAI, and Gemini via OAuth.

That's the stack. Real-time awareness + historical analysis + optional budget enforcement. Now let's go deep on every category.

Multi-Platform CLI Token Trackers

These tools aggregate token usage data from multiple AI coding harnesses by reading local session files. They require no API keys and keep all data local.

Agentlytics

The broadest editor coverage available. Supports Cursor, Windsurf, Claude Code, VS Code Copilot, Zed, Antigravity, OpenCode, and Command Code — 16 editors total.

npx agentlytics
  • Unified dashboard with KPIs, heatmaps, usage streaks, per-project analytics, and side-by-side editor comparisons
  • Team Relay mode for sharing usage across organizations
  • All data stays completely local — zero configuration needed

GitHub: f/agentlytics (359 stars)

tokscale

Supports Claude Code, OpenCode, OpenClaw, Codex CLI, Gemini CLI, Cursor, AmpCode, Factory Droid, Kimi, and Pi.

npx tokscale
  • Real-time pricing from LiteLLM, global leaderboard, and 2D/3D contribution graphs
  • Platform-specific filters (e.g., tokscale --claude)
  • Kardashev-scale gamification for token consumption

GitHub: junhoyeo/tokscale (1.4k stars) | tokscale.ai

toktrack

Built in Rust with SIMD acceleration. Scans 3,500+ session files in approximately 40ms — the fastest scanner in this category.

cargo install toktrack
  • Supports Claude Code, Codex CLI, Gemini CLI, and OpenCode
  • Daily/weekly/monthly views and model-level breakdowns
  • TUI dashboard with 3 tabs and shareable SVG receipts
  • Preserves cost history even after CLI tools delete session files

GitHub: mag123c/toktrack (71 stars)

ccusage

The most mature and popular CLI tracker. Supports Claude Code and Codex CLI with daily, monthly, and session-based usage with cost breakdowns.

npx ccusage
  • Reads local JSONL files, offline mode with cached pricing
  • MCP integration and multi-instance/project support
  • Live monitoring mode for real-time tracking

GitHub: ryoppippi/ccusage (12k stars) | ccusage.com

tokenusage (tu)

Rust-based CLI/TUI/GUI tool supporting Claude Code, Codex CLI, and Antigravity.

cargo install tokenusage
  • Claims to be 214x faster than ccusage for Claude and 138x faster for Codex
  • Live monitoring mode and image generation for shareable usage cards

GitHub: hanbu97/tokenusage

tokentop

A real-time terminal TUI described as "htop for AI costs." Connects to Anthropic, OpenAI, and Gemini via OAuth.

go install github.com/tokentopapp/tokentop@latest
  • Tracks per-request cost, model, tokens, and duration with daily/weekly/monthly budgets
  • Spending limits with visual warnings, cache leverage analysis, and ASCII step charts
  • 4 different dashboard views

GitHub: tokentopapp/tokentop (40 stars) | tokentop.app

Platform Coverage Comparison

ToolClaude CodeCodex CLIGemini CLICursorAntigravityLanguageStars
AgentlyticsYesYes (VS Code)Yes (Zed)YesYesJS (npx)359
tokscaleYesYesYesYesYes (AmpCode)JS1.4k
toktrackYesYesYesNoNoRust71
tokentopYes (OAuth)Yes (OAuth)Yes (OAuth)NoNoGo40
tokenusageYesYesNoNoYesRust8
ccusageYesYesNoNoNoJS (npx)12k

My take: If you use multiple editors, Agentlytics is the easy choice — broadest coverage, zero config. If you only use Claude Code + Codex, ccusage is more mature, more popular (12k stars), and focused. If raw speed matters (scanning thousands of session files), toktrack's Rust+SIMD approach is unmatched.

macOS Menu Bar Apps

Always-visible monitoring apps that sit in the macOS menu bar. These provide glanceable, real-time awareness of token spend and rate limits without requiring you to run a command.

CodexBar

Supports 15+ providers including OpenAI Codex, Claude Code, Cursor, Gemini, GitHub Copilot, and OpenRouter.

brew install --cask codexbar
  • Dual-meter format showing session and weekly limits, credits, and countdown timers
  • Reads local CLI logs and browser cookies — privacy-first with no external data transmission
  • Free, open source, and installable via Homebrew
  • Includes a bundled CLI for terminal-only users

GitHub: steipete/CodexBar (9.4k stars) | codexbar.app

TokenBar

Supports OpenAI, Claude, Cursor, OpenRouter, Perplexity, Vertex AI, DeepSeek, and Mistral — 20+ providers total.

  • Key differentiator: runaway-prompt detection — catches retries and background agent activity before costs spike
  • Pace indicators and incident detection alerts when spending snowballs
  • All data stays local; $4.99 one-time purchase

tokenbar.site

Tokemon

Focused on Claude Code with deep analytics. Polls usage every 30 seconds.

  • Burn rate per hour, per-project token breakdowns, and team analytics via Admin API
  • 24h/7d usage charts and estimates when you'll hit limits
  • Organization-wide analytics and Raycast integration; open source (MIT)

tokemon.ai

SessionWatcher

Lightweight tool for Claude Code and Codex CLI. Tracks token usage, costs, and rate limits.

  • Rolling 5-hour window tracking and countdown
  • No API keys needed; $1.99 one-time purchase

sessionwatcher.com

ClaudeBar

Supports Claude, Codex, Antigravity, and Gemini.

  • Quota health with color-coded progress bars, auto-refresh, system notifications
  • Keyboard shortcuts; open source

GitHub: tddworks/ClaudeBar

My take: CodexBar wins on breadth (15+ providers, Homebrew install, 9.4k stars, free). TokenBar is the pick if runaway-prompt detection matters — it catches those cases where an agent gets stuck in a retry loop and your costs spike before you notice. Community reports of single Explore() operations consuming 90k tokens and MCP tool descriptions filling context before real work starts make this a real concern. Tokemon is best for Claude Code power users who want per-project granularity and burn rate predictions.

Claude Code-Specific Tools

Purpose-built tools focused specifically on monitoring and analyzing Claude Code usage. These offer the deepest integration with Claude Code's session data and telemetry.

ToolDescriptionKey Feature
claude-code-otelFull observability stack (OTEL + Prometheus + Loki + Grafana)Cost/tokens/sessions/tool usage/DAU/WAU/MAU tracking
claude_telemetryDrop-in 'claudia' command replacementExports to Logfire, Sentry, Honeycomb, Datadog
claude-code-metrics-stackLocal Grafana dashboardCost, tokens, sessions, productivity metrics
claude-code-usage-analyzerDetailed breakdowns by model and token typeStatistical insights (mean/median/P95)
cccostInstruments Claude Code for actual token trackingOutputs .usage.json for statusline scripts
Claude-Code-Usage-MonitorReal-time monitoring with predictionsConfigurable plan support (Pro/Max x5/x20)

The official Anthropic monitoring guide at github.com/anthropics/claude-code-monitoring-guide provides the canonical approach to setting up OpenTelemetry-based telemetry for Claude Code.

Integration Notes

  • claude-code-otel is the most comprehensive — full Grafana stack with pre-built dashboards
  • claude_telemetry is best if you already use Datadog/Sentry/Honeycomb and want to send Claude Code data there
  • cccost is the lightest option — just outputs JSON for statusline integration
  • The official claude-code-monitoring-guide should be your starting point for any enterprise deployment

LLM Observability Platforms

Enterprise-grade platforms that provide comprehensive monitoring, tracing, and cost tracking across multiple LLM providers. Best suited for teams running production AI applications or needing deep analytics beyond simple token counting.

Helicone

Open-source AI gateway (YC W23) supporting 100+ providers with one-line integration.

  • Token usage, latency, and cost tracking with open-source cost data for 300+ models
  • Caching, rate limiting, custom properties/tags, request monitoring
  • Intelligent routing with failover

GitHub: Helicone/helicone (5.4k stars) | helicone.ai

Langfuse

Open-source observability platform (YC W23) with automatic cost calculation for supported models. The most popular open-source option in this category.

  • Supports cached, audio, and image tokens
  • Cost/latency breakdowns by user, session, feature, model, and prompt version
  • OTEL-native SDK, self-hostable

GitHub: langfuse/langfuse (23.9k stars) | langfuse.com

LangSmith

Commercial platform by LangChain. Automatically records token usage and cost.

  • Custom dashboards for P50/P99 latency, error rates, cost breakdowns, and feedback scores
  • Trace trees and cost alerts/thresholds

langchain.com/langsmith

Portkey

Open-source gateway supporting 1,600+ LLMs and providers.

  • Tracks token usage, retries, and budget adherence as part of gateway observability
  • Dynamic model switching, workload distribution, failover
  • Real-time observability dashboard built on OpenTelemetry

GitHub: Portkey-AI/gateway | portkey.ai

Braintrust

Provides per-request cost breakdowns with tag-based attribution across prompt, cached, completion, and reasoning tokens.

  • Automatic token/cost capture with zero config, budget alerts
  • Identifies which 5% of requests consume 50% of tokens

braintrust.dev

Observability Platform Comparison

PlatformOpen SourceSelf-HostableProvider CountKey DifferentiatorStars
HeliconeYesYes100+AI Gateway with caching + rate limiting5.4k
LangfuseYesYesMany (OTEL)Trace trees, prompt versioning23.9k
LangSmithNoNoManyLangChain ecosystem, custom dashboards-
PortkeyYes (gateway)Yes1,600+Dynamic routing + failover-
BraintrustPartialNoMajorTag-based cost attribution-

Selection Guidance

  • Need full self-hosted control? Helicone or Langfuse
  • Already using LangChain? LangSmith integrates natively
  • Maximum provider coverage? Portkey supports 1,600+ LLMs
  • Cost attribution by feature/team? Braintrust's tag-based approach
  • Gateway + observability in one? Helicone or Portkey

Proxy and Router Tools

These tools sit between your coding harness and the AI provider, routing requests while tracking usage. They offer the most granular control over spending and can enforce limits.

LiteLLM

Open-source proxy providing a unified OpenAI-compatible API across 100+ LLM providers.

pip install litellm
  • Built-in spend tracking per key, team, and user with Prometheus metrics and PostgreSQL logging
  • UI dashboard plus Grafana, Datadog, and SigNoz integrations
  • Per-request spend and token logging with configurable budget limits

GitHub: BerriAI/litellm

OpenRouter

Commercial service providing access to hundreds of models from multiple providers.

  • Activity dashboard with CSV/PDF export, per-key credit limits with auto-reset
  • Enterprise usage monitoring — Spend, Tokens, and Requests metrics with filtering by model, API key, and time period

openrouter.ai

9Router

Open-source smart routing tool supporting Claude Code, Cursor, Antigravity, Copilot, Codex, Gemini, OpenCode, Cline, and OpenClaw.

  • 3-tier smart fallback routing
  • Quota tracking for Claude Code/Codex/Gemini, spending limits per provider
  • Auto token refresh

GitHub: decolua/9router | 9router.com

Bifrost (Maxim AI)

Open-source gateway (Apache 2.0) focused on Claude Code / Anthropic API. Only 11 microsecond overhead at 5K RPS.

  • Hierarchical budget management
  • Built-in dashboard and integration with the Maxim observability platform

getmaxim.ai

Engineering Team ROI Tools

These tools go beyond token counting to measure the business impact of AI coding tools — connecting token spend to engineering outcomes like deployment frequency, lead time, and code quality.

Jensen Huang's vision of giving every engineer a token budget equal to half their salary means these tools will matter a lot more soon. If you're giving a $200K engineer $100K in annual AI compute, you need to prove ROI.

Faros AI

Enterprise platform connecting AI tool usage from Claude Code, GitHub Copilot, and Cursor to engineering outcomes.

  • Tracks token usage, costs by model, and output metrics (commits, PRs)
  • Single pane of glass across all AI coding tools
  • Correlation to lead time, deployment frequency, and change failure rate (DORA metrics)

faros.ai/ai-impact

Swarmia

Tracks adoption rates and usage patterns for GitHub Copilot and Cursor with direct integrations.

  • Correlates AI tool usage with DORA metrics
  • Measures license utilization vs. active usage to identify underutilized seats

swarmia.com/product/ai-impact

GitClear

Measures code quality impact rather than raw tokens for GitHub Copilot, Cursor, and Claude Code.

  • AI-attributed code tracking and code duplication analysis
  • Churn analysis to determine if AI-generated code is actually sticking

gitclear.com

ToolSpend

Connects AI service subscriptions with banking data to show actual spend across ChatGPT, Claude, Cursor, Perplexity, ElevenLabs, and more.

  • Identifies underutilized seats, duplicate tools
  • Pre-renewal alerts with team-level attribution

Python and JS Libraries

For developers building custom token tracking into their own applications, these libraries provide programmatic access to token counting and cost estimation.

LibraryLanguageProvidersKey Feature
TokenatorPythonOpenAI, Anthropic, GoogleSDK wrappers with automatic cost tracking
TokenXPythonMultipleDecorators for cost/latency monitoring
llm-token-trackerJSOpenAI, ClaudeMCP support for token tracking
tokencostPython400+ LLMsPrice estimates across all major providers

Usage Patterns

  • Tokenator is best for wrapping existing OpenAI/Anthropic SDK calls — add one line to get cost tracking
  • tokencost is best as a standalone price lookup library — covers 400+ models with regularly updated pricing
  • TokenX uses Python decorators for minimal code changes
  • llm-token-tracker is the only JS option and includes MCP server integration

Comparison Matrix

Best tool by use case:

Use CaseTop PickRunner-UpWhy
Track all coding CLIsAgentlyticstokscaleBroadest editor coverage (16), zero config
macOS menu barCodexBarTokenBar15+ providers, Homebrew, 9.4k stars
Enterprise gatewayHeliconePortkeyOpen source, 100+ providers, caching
Full observabilityLangfuseLangSmithSelf-hostable, OTEL-native, 23.9k stars
Claude Code onlyccusagetoktrackMost mature, 12k stars, npx ccusage
Budget alertsTokenBartokentopRunaway prompt detection
Team ROIFaros AISwarmiaConnects tokens to DORA metrics
Smart routing9RouterLiteLLM3-tier fallback + quota tracking
Fastest scannertoktracktokenusageRust + SIMD, 40ms for 3500 files

Individual Developer (Free Stack)

A two-tool stack covers both real-time awareness and historical analysis:

  1. CodexBar — Install once, always visible. Answers "am I about to hit my rate limit right now?" Covers 15+ providers.
  2. Agentlytics — Run periodically to review per-project, per-editor, per-model cost breakdowns. Answers "where did my tokens go this month?"
  3. Optional: tokentop — Add if you want a dedicated TUI with budget alerts and cache analysis.
# Get running in 60 seconds
brew install --cask codexbar
npx agentlytics

Team / Organization

  • LiteLLM as a centralized proxy — Unified API, per-team/per-user spend tracking, budget enforcement
  • Helicone or Langfuse for observability — Cost/latency dashboards, alerting, self-hostable
  • Faros AI for executive reporting — Connect token spend to engineering outcomes (DORA metrics)

Claude Code Power User

# Historical analysis (12k stars, most mature)
npx ccusage

# Fast scanning (Rust + SIMD)
cargo install toktrack

# Menu bar monitoring (9.4k stars, 15+ providers)
brew install --cask codexbar

Agentlytics vs CodexBar: Complementary, Not Competing

These tools answer different questions:

DimensionAgentlyticsCodexBar
When to usePeriodic reviewAlways on
What it showsHistorical trends, per-project costsCurrent session, live spend
How it worksRun a command (npx agentlytics)Menu bar icon
Key question"Where did my tokens go this month?""Am I about to hit my limit?"
InstallGlobal npx (no install needed)Homebrew / download

Use both. They're complementary.

CLI Token Trackers

Observability Platforms

Proxies and Routers

Team ROI

Libraries

Official

Share:
Enjoyed this post? Subscribe for more.
>