Employees Are Burning the Whole Token Budget — Build AI Spend Governance

The Problem

A year ago this was simple. Anthropic and OpenAI sold flat subscriptions, so a company knew its monthly bill in advance. In 2026 both moved large parts of their service to token-based billing, and the ground shifted. Now every prompt and every automated workflow hits the invoice directly. The fallout was fast and painful. Uber burned its annual AI budget in four months and, starting in April, capped each employee at $1,500 per month per coding tool. Walmart throttled tokens on its internal “Code Puppy” vibe-coding platform after usage skyrocketed, and Amazon warned staff to stop using AI “for the sake of using AI” after engineers spun up agents just to climb internal leaderboards. The root issue is visibility. Cloud spend is something legacy FinOps tools can see; LLM APIs are external services, so no one can tell who, which team, or which task burned how many tokens. As Accenture’s agentic-AI lead put it, spend has become unpredictable, and CFOs, COOs, and CIOs still can’t say whether they’re getting value for what they pour into AI.

Why Now

The timing lines up two ways. First, the billing model just changed. The shift from flat subscriptions to token billing went mainstream in 2026 — before that, this tool wasn’t needed, because invisible cost creates no demand to control it. Second, the pain went public. When companies the size of Uber, Walmart, and Amazon openly start imposing caps, that’s a signal that a long tail of mid-size orgs is quietly suffering the same thing. The industry is flipping from an era of tokenmaxxing to one of token rationing. Yet legacy cloud FinOps tools catch EC2 and S3 costs and see nothing inside an external LLM API. The intent to buy already exists; the product to receive it is missing. Stepping into that gap is the whole opportunity.

How to Build It

Drop a proxy or gateway in the path of every LLM call. Route all API traffic through this layer and log who (employee, team, API key), which task, which model, and how many tokens. Three core features. One, cost attribution — slice spend by employee, team, project, and task on a dashboard, so the CFO’s question finally gets a numeric answer. Two, budget guardrails — set per-team and per-person limits, alert at thresholds, and auto-block past them, automating in policy what Uber did by hand. Three, policy routing — automatically send simple jobs like classification and summarization to a cheaper model and reserve the expensive one for hard reasoning, so you don’t just block usage, you finish the same work for less. Wire it into the existing observability stack with SSO and SCIM to cut adoption friction.

Success Criteria

Stay a nice-to-have dashboard and you die. The savings have to be visible from day one — an immediate ROI story like “routing cut your spend 20–30% in the first month” has to be the sales line. You sell to the CFO and CIO, not the dev team; the budget holder has to feel that AI spend became controllable before they sign. Two risks. First, if Anthropic and OpenAI build this into their own consoles, your value shrinks inside any single provider — so move fast to own the position of neutral, cross-vendor cost governance. Second, a proxy in front of every call is a single point of latency and failure. Keep the gateway light and resilient, and don’t touch prompt data carelessly — your own governance has to be stricter than your customer’s to earn trust.