How to Track AI Coding Agent Token Costs Across Claude Code, Cursor, Copilot, Codex and Gemini CLI

Start in 5 minutes

  1. Create a RutaAPI account.
  2. Add prepaid credits.
  3. Create an API key in the dashboard.
  4. Check available models with /v1/models.
  5. Use one returned model name as MODEL_NAME.
  6. Send your first chat/completions request.

When you use an AI coding tool like Claude Code, Cursor, GitHub Copilot, Codex, Gemini CLI, or Cline, every request you send consumes tokens. Input tokens cover your prompts, file contents, and conversation history. Output tokens cover the model's responses. Cached tokens represent repeated context that has already been processed. Each of these carries a different price tag, and tracking them accurately is harder than it sounds.

This guide walks through how token costs accumulate across AI coding tools, how to estimate them from RutaAPI's free AI Coding Cost Monitor, and how to build a lightweight tracking workflow that works without any infrastructure changes.

Quick answer

To track AI coding agent token costs, export or normalize usage logs into CSV or JSON with fields such as tool, project, model, input_tokens, output_tokens and cached_tokens. Then estimate cost using per-million-token prices for each model. RutaAPI's AI Coding Cost Monitor provides a browser-based way to test this workflow without any signup.

1. Why AI coding costs are getting harder to track

Five years ago, a developer might have paid a flat monthly fee for an AI assistant. Today, costs are metered per-token across multiple providers, and most teams use several tools simultaneously. A single engineering team might run Claude Code for complex debugging, Cursor for IDE suggestions, GitHub Copilot for autocomplete, and Gemini CLI for script generation — all billed through different providers at different rates.

The challenge is that none of these tools present a unified cost view. Claude Code bills through Anthropic. Cursor might use its own API or route through OpenAI. Copilot has its own billing model entirely. Without a way to aggregate usage across all of them, you only see the total when the invoice arrives.

Beyond multi-tool complexity, there is the issue of context reuse. When an AI coding agent processes a large codebase, it typically sends the entire conversation history on every request. If you do not track how many tokens are being resent, you will significantly overestimate the value of any caching discount the provider offers.

2. What input, output and cached tokens mean

Understanding the three token types is the foundation of any cost estimate.

Input tokens are the tokens sent to the model in a request. For an AI coding agent, this includes your current prompt, the surrounding code context, relevant file contents, system instructions, and the full conversation history the agent chooses to include. Input tokens are priced per million (per 1M tokens).

Output tokens are the tokens the model generates in response. These include code suggestions, explanations, refactored files, and any other text the model produces. Output tokens are almost always more expensive per-token than input tokens for the same model — sometimes three to five times higher.

Cached tokens are input tokens that the model provider has already processed and stored. When the same prompt is reused (for example, the same system instructions or shared codebase context), providers charge only the cached rate — typically 10–20% of the full input price. This discount only appears if your usage logs report cached token counts separately.

Token type What it covers Relative cost
Input Prompts, code context, history, system instructions Base rate
Output Model responses, generated code, explanations 2–5× input rate
Cached Repeated context already processed by the provider 10–20% of input rate

3. Why repeated context burns money

AI coding agents work best with rich context. They read your project files, pull in related modules, and reference earlier decisions from the conversation. This works well for quality — but it means every request carries more tokens than a typical chat with an LLM.

The problem is that many agents send the full conversation history on every request, even when only a small portion of it is genuinely new context. In a 30-minute debugging session with 20 back-and-forth exchanges, the agent may be re-sending the same 30,000-token context 20 times. Without caching, that is 600,000 input tokens consumed. With a 90% cache hit, it drops to 60,000 tokens — but only if the provider reports the cached count in your logs.

This is why the repeated context ratio matters: it is the percentage of input tokens that were served from cache. A high ratio means the model is re-processing large amounts of repeated context, which inflates costs even when the actual new work is small. The AI Coding Cost Monitor calculates this ratio automatically from your logs.

4. A simple CSV format for AI coding cost tracking

Most AI coding tools do not yet have built-in cost export features. The practical workaround is to instrument your logging pipeline to capture the token fields that providers return with each API response. Here is the minimal CSV format that works with the AI Coding Cost Monitor:

tool,project,model,input_tokens,output_tokens,cached_tokens,cost_usd,timestamp,session_id
Claude Code,bugfix-auth,claude-3-5-sonnet,45000,12000,30000,0,2026-05-10T10:30:00Z,sess_001
Cursor,feat-api,claude-3-5-sonnet,32000,9000,20000,0,2026-05-10T12:00:00Z,
GitHub Copilot,docs-readme,gpt-4o-mini,15000,4200,0,0,2026-05-10T13:00:00Z,

All fields except tool are optional — the monitor will infer what it can. If you export from a logging system that uses different field names, the monitor attempts to match common aliases automatically (for example, prompt_tokens, completion_tokens, usage.total_tokens).

5. How to estimate cost from token usage

Once you have token counts, cost estimation is straightforward arithmetic. The formula is:

estimated cost =
  input_tokens  × (input price per 1M tokens)
+ output_tokens × (output price per 1M tokens)
+ cached_tokens × (cached price per 1M tokens)

Here is an example with Claude 3.5 Sonnet pricing:

Input:   45,000 tokens × $3.00 / 1M  = $0.135
Output:  12,000 tokens × $15.00 / 1M = $0.180
Cached:  30,000 tokens × $0.30 / 1M  = $0.009
                                        Total = $0.324

If your logs already include a cost_usd field, that value is used directly. Otherwise, the monitor applies the formula using the pricing you configure in the tool. You can adjust per-model prices in the sidebar to match your actual contract rates — for example, if you use RutaAPI pricing instead of standard provider list prices.

6. Common mistakes when comparing AI coding tools

Comparing tool costs fairly requires attention to a few subtle details that catch most people out.

Ignoring the model within the tool. Cursor can use Claude 3.5 Sonnet, GPT-4o, or GPT-4o Mini depending on your settings. The tool name alone does not tell you the cost — the model matters more.

Comparing list prices instead of effective prices. Standard per-token prices are published for reference, but negotiated enterprise rates, volume discounts, and prompt caching can change the effective cost significantly. Edit the pricing table to match your actual rates.

Not accounting for session length. A tool that feels cheap per-request can be expensive over a long session if it resends large amounts of context. Look at total cost per session, not cost per API call.

Assuming cached token data is always available. Not all logging systems expose cached token counts. If your logs omit them, the monitor will estimate costs using full input prices, which will be higher than reality. Some providers also do not expose cache data at all in their API responses.

Comparing tools with different context window sizes. A model with a 200K-token context window will naturally have higher per-call token counts than one capped at 32K. This is not necessarily a sign of waste — it may reflect fundamentally different use cases.

Try the AI Coding Cost Monitor with a CSV or JSON example.

7. Tracking Claude Code token costs

Claude Code sends the full conversation context on every request by default. In a typical debugging session, this means each turn re-sends all previous exchanges, project files, and system instructions. Without monitoring, it is easy to underestimate how quickly token counts accumulate across a long session.

To track Claude Code costs, enable verbose logging with claude --verbose --log-file ./usage.log, export the log periodically, and feed it into the AI Coding Cost Monitor. The monitor will split costs by project and session, and surface the repeated context ratio so you can see how much of the input is cached versus fresh tokens.

If you notice a high repeated context ratio in your Claude Code logs, consider breaking long sessions into shorter focused tasks. Each new session starts with minimal context, keeping per-session token counts lower.

Try the AI Coding Cost Monitor with a CSV or JSON example.

8. Tracking Cursor AI coding costs

Cursor uses a different session model — it maintains a persistent context window within the IDE. Costs accumulate based on how much file context you include in each Composer or Chat request. Because Cursor can include entire project files, input token counts can spike unexpectedly during large refactors.

The key to keeping Cursor costs manageable is to be intentional about which files you attach. Attaching only the files directly related to the current task, rather than the full project, can reduce input token counts significantly without degrading output quality. If you export your Cursor usage logs, you can use the AI Coding Cost Monitor to see which projects have the highest token consumption and adjust your context strategy accordingly.

Try the AI Coding Cost Monitor with a CSV or JSON example.

9. Gemini CLI token usage and cost

Gemini CLI charges for input and output tokens like most providers, but its long-context models (up to 1M tokens) mean a single large request can cost more than a dozen smaller ones. Gemini 2.5 Flash is significantly cheaper than Gemini 1.5 Pro for most coding tasks, and the price difference compounds when you process large codebases in a single call.

When working with Gemini CLI, watch the input token count on each request. A request that includes a 50,000-token codebase file plus a 500-token prompt costs roughly the same as a 50,500-token prompt — so batching file content into fewer, larger calls is more cost-effective than sending many small requests with overlapping context.

Try the AI Coding Cost Monitor with a CSV or JSON example.

10. Estimating Cline token cost

Cline, like Claude Code, sends full conversation history on each request. Sessions that span many tool-use cycles — where Cline reads a file, edits it, runs a test, and reads the result — accumulate input tokens rapidly. Each file read adds to the context, and the total grows with the size of the project being modified.

Cline logs can be captured by enabling verbose output in your terminal or by piping API responses through a logging proxy. The AI Coding Cost Monitor accepts these logs directly as CSV or JSON, and will break down costs by session and project. If you see Cline sessions consuming more tokens than expected, try breaking the task into smaller goals with clearer boundaries — this reduces the amount of context the agent needs to maintain between steps.

Try the AI Coding Cost Monitor with a CSV or JSON example.

11. Try the free AI Coding Cost Monitor

If you have usage logs from any AI coding tool, you can get a full cost breakdown in under a minute. The AI Coding Cost Monitor is a browser-based tool — all processing happens locally. No data is uploaded, no account is required.

Try the free AI Coding Cost Monitor

Upload a CSV or paste JSON usage data. See total estimated cost, cost by tool, project, and model, plus repeated context ratio. All calculations run locally in your browser.

Use AI Coding Cost Monitor →

The monitor works with any AI coding tool that can produce structured logs — from Claude Code to Cline to custom tooling. You can also browse all free RutaAPI tools or read the API documentation to understand how token tracking fits into a broader observability setup.

12. FAQ

Does this tool upload my logs?
No. All processing happens in your browser. Your data never leaves your device.

Do I need an API key?
No. You provide your own usage data as a CSV or JSON file. No authentication is required.

Which tools are supported?
Claude Code, Cursor, GitHub Copilot, Codex, Gemini CLI, Cline, and any tool that can export token usage as CSV or JSON. The monitor attempts to infer the model from the tool name when the model field is missing.

How accurate are the estimates?
Estimates are as accurate as the pricing you enter. If your logs include a cost_usd field, that value is used directly. Otherwise, the monitor applies the per-token formula using your configured prices. You can edit the pricing table at any time.

Can I edit model prices?
Yes. Click "Edit pricing estimates" in the sidebar to update per-million-token prices for any model. Changes apply immediately to all results.

What are cached tokens?
Cached tokens are input tokens that the model provider has already processed and stored for reuse. Providers charge a significantly reduced rate for these — typically 10–20% of the standard input price. If your logs report cached token counts, the monitor separates them in all cost calculations.

Why does the total cost in the monitor differ from my provider bill?
Differences can arise from: custom contract pricing that differs from standard list rates, additional charges not covered by input/output/cached counts (such as reasoning tokens or system prompt charges), rounding differences between the provider and your log source, or missing cached token data in your logs causing full input prices to be applied.

Can I use this for team reporting?
Yes. Export usage logs from each developer's tool, combine them into a single CSV, and load the aggregated data into the monitor. You can use the project field to break down costs by team or work stream.


Need transparent OpenAI-compatible API access?
RutaAPI provides a unified API gateway for OpenAI, Anthropic, Google, DeepSeek, and other providers. Create a free account to get started.

Need transparent OpenAI-compatible API access?

RutaAPI routes requests across multiple providers through a single OpenAI-compatible endpoint. Configure fallback chains, track usage, and manage costs in one place.

Try RutaAPI →
Ready to test RutaAPI? Use one OpenAI-compatible base URL, prepaid credits, and API keys from the dashboard.