Brief IA

AI Tokens: Soaring Costs for Businesses

🤖 Models & LLM·Tom Levy·

AI Tokens: Soaring Costs for Businesses

AI Tokens: Soaring Costs for Businesses
Key Takeaways
1Companies are facing rising costs with the token-based pricing model for AI, replacing the old fixed rates.
2Tokens, the basic units of AI, standardize billing but conceal significant underlying complexity.
3Despite a decrease in unit costs, the growing demand for tokens is leading to an explosion in overall spending.
💡Why it mattersCompanies need to adapt their financial management to navigate this new AI token economy, or risk seeing their costs skyrocket.
Le brief IA que lisent les pros

Le brief IA que les pros lisent chaque soir

Les 7 actus IA du jour, décryptées en 5 min. Gratuit.

Inclus dès l'inscription : notre sélection des meilleurs guides & comparatifs IA.

Choisis ton rythme

Gratuit · Pas de spam · Désabonnement en 1 clic

📄
Full Analysis

AI Tokens and Billing

The rise of artificial intelligence (AI) has led to a radical transformation in pricing models, shifting from a fixed rate to token-based billing. This change is proving to be significantly more expensive for businesses. The challenge lies in measuring the actual value delivered by AI, a problem that remains unresolved.

A few months ago in San Diego, users enjoyed a fixed rate for accessing AI services. That era is now over. Today, AI is no longer offered at a loss. At the FinOps X 2026 conference, the token pricing model was presented as the central pillar of the generative AI economy, surpassing the costs of previous models. Users of CoPilot, for instance, are expressing their dissatisfaction with these new rates.

For many companies, this situation recalls the early days of cloud pricing, marked by unpredictable bills and constantly evolving business models. Behind this apparent confusion, tokens are quietly standardizing how labs translate GPU capacity into billable units, how businesses measure AI usage, and how software publishers adjust their prices.

Tokens: The Atomic Units of AI

In this new paradigm, the token has become the fundamental unit of work in AI. J.R. Storment, executive director of the FinOps Foundation, describes it as "the atomic unit of AI." In his keynote speech at FinOps, Storment emphasized that "tokens play a more central role in the modern economy than almost any other commodity in recent history, perhaps except for oil in the 20th century." Tokens represent both "the unit of production for all hardware, computing, and data centers," "the way labs price their inputs and outputs," and "the unit of value that companies seek to monetize."

This abstraction is particularly appealing to labs and hyperscalers. Rather than charging directly for types of GPUs, memory, and energy, they can offer a single unit — tokens per million — through a complex mix of architectures and deployment topologies. OpenAI, Anthropic, Google, and others are now publishing pricing grids by model with distinct prices for input tokens (what you send to the model) and output tokens (what the model generates), typically expressed in dollars per million tokens.

What is a Token?

An AI token, according to Storment, "is the smallest unit that a word or phrase can be broken down into when processed by a large language model (LLM)." Before a model can process text, it breaks it down into fragments, a process called tokenization. For English, a common rule of thumb is that "one token corresponds to about four characters, or roughly three-quarters of a word," so "100 tokens ≈ 75 words."

The token conceals enormous complexity. As the FinOps team at SAP pointed out, "You pay per token, and that little token hides a huge underlying complexity to predictability," ranging from model choice and quantification to aggressive use of caching or agents. This complexity is exactly what FinOps teams are now tasked with decoding.

The End of the Unlimited Token Era

While the period from 2023 to early 2025 was characterized by cheap experiments, the last 18 months have been a harsh awakening. Storment describes three distinct phases: the "old days of AI" before ChatGPT, the "good old days of AI" when chatbots "could write decent code," and then the post-November 2025 world when major model releases "moved AI from pretty good to really good."

In the good old days, the era of unlimited tokens and subscriptions, we went through a brief period of token maximization. Then everyone was excited about their token leaderboard, which showed who had the most token usage. Today, those rankings are painfully outdated as no one can afford to waste tokens. As Amazon's senior vice president, Dave Treadwell, pleaded, "Please do not use AI just for the sake of using AI."

Objectively, between June and November of last year, Storment stated that global token usage grew in a "linear" fashion. Then these new models and agentic patterns arrived. Pop-ups "went from a few thousand or tens of thousands or hundreds of thousands to millions of tokens in a single conversation," and "agentic has exploded," adding "loops and retries and all that craziness."

Scarcity Keeps Token Prices Rising

If Moore's Law and hyperscale competition were the only forces at play, one would expect token prices to continue to decline. To some extent, this is true. "Since 2023, token prices have dropped dramatically," Storment acknowledged. Internal data from SAP tells a similar story. "Here’s our cost per token over the same period," said SAP data scientist Maida Nazifi, showing their internal graph. "It is clearly trending downward, even with a slight flattening at the end. And honestly, that aligns with the narrative everyone wants to believe, right? Token prices continue to fall."

But both emphasize the caveat: the floor may be in sight. Storment notes that if "you look at the top labs and their prices, you go back to the Wayback Machine. Token prices have remained fairly stable since November 2025," which he directly links to hardware and energy constraints: "We can't get enough hardware, we can't get enough energy… we see delays, we see long engagement periods, and we see shortages."

He cites Intel's CEO saying he does not expect any real relief in the supply of GPUs and related components "before 2028." Nazifi and SAP vice president Frederik Pohl observe the same trends in their company: Pohl warned, "We have supply chain constraints, we have rising hardware prices, and the prices of new cutting-edge models are becoming increasingly expensive."

The net result is a classic Jevons Paradox: unit costs are falling, total expenditures are exploding. "Even with falling token prices, we find that our expenditures continue to increase, and that's the famous paradox," Pohl stated. "At our scale, we had falling unit costs, but we found that in some months, expenditures doubled."

Storment believes the paradox is just beginning. Goldman Sachs estimates that global usage will rise from "6 quadrillion tokens" today to "120 quadrillion tokens projected" in about 3.5 years. Even if token prices drop again once supply eases, it is unlikely they will fall 24 times faster than volume grows.

FinOps Discovers the Token Economy

For the FinOps community, which has gained experience in cloud sizing and reserved instances, token pricing is both familiar and completely foreign. The familiar part is that it is usage-based, that bills are high, and that forecasting is difficult. The foreign part? The unit is tied to language, not infrastructure, and it changes as quickly as model outputs, not as slowly as server depreciation schedules.

Pohl asserted that "AI is not just extending the cloud playbook; it is breaking it; AI is more different from the cloud than the cloud was from the data center." Unlike CPUs, "AI models do not resemble that at all… they have their unique strengths and weaknesses… They have different cost profiles, and replacing an LLM is not just a pricing decision. It is also a decision about output quality."

SAP's experience is a case study on how companies are retooling. Its Business AI platform, Pohl explained, operates on "multiple different LLMs," including "ChatGPT, Anthropic, Gemini… other open-source models," layered over "different hyperscalers."

When SAP first sought data on AI costs, "we immediately hit a wall," Nazifi recalled. "Existing [cloud] tools were very blind to the nuances of LLMs, so they could tell us that we spent this amount on [a provider], but not really which model, or how much for the model. It was really like trying to optimize your gold mining operation by looking at the total weight of the ore."

They then proceeded laboriously: "We manually extracted data, we merged data across spreadsheets, and then we had this first image in hand." This image, once it reached their global infrastructure lead and then the CTO, transformed the conversation. "In a few days, it went from 'OK, that's interesting, keep me posted' to… 'I need this regularly, I want more,'" Nazifi stated. Pohl added the FinOps lesson: "If you have a CTO asking for a number, it is not a question, it is a mandate."

This demand forced SAP to formalize an internal FinOps framework for AI based on three pillars:

  • Visibility of Spending: "What we consume, how we consume it, and where we consume it," across models, platforms, business units, and regions.

  • Economy: "How efficient are you in your use of tokens?"

Brief IA — L'actualité IA en français

L'essentiel de l'actualité de l'intelligence artificielle, décrypté et expliqué chaque jour.