AI Agents: Unpredictable Costs Worry Users

⚡

Key Takeaways

1The costs of AI agents in tokens are unpredictable and can increase significantly, according to a recent study.

2The study reveals that AI agents consume up to 3,500 times more tokens than simple prompts, making bills difficult to anticipate.

3The cost variations between models and tasks are considerable, with no reliable prediction possible, according to the researchers.

💡Why it matters — Companies need to understand these costs to avoid unexpected expenses and optimize the use of AI agents.

Costs of AI Agents: A Concerning Variability

The use of artificial intelligence (AI) agents comes with a significant increase in costs in terms of tokens. These agents, due to their complex nature and functioning, do not allow for precise forecasting of total token consumption. This situation drives users to demand greater price transparency and performance guarantees from providers.

In the vast field of challenges posed by agentic AI, the question of cost remains one of the most misunderstood. Although companies like OpenAI, Google, and Anthropic publish price lists, these do not provide a clear picture of the final costs to solve a given problem.

A recent study conducted by the University of Michigan, in collaboration with other institutions, highlights a potential price shock: the costs of AI agents can soar unpredictably, making their financial management risky.

Study Results: Exponential Token Consumption

The study, led by Longju Bai from the University of Michigan, with collaborators from Stanford, All Hands AI, Google DeepMind, Microsoft, and MIT, is titled "How AI Agents Spend Your Money? Analysis and Prediction of Token Consumption in Agentic Coding Tasks." It is presented as the first systematic study on this subject.

Researchers found that agents consume token amounts significantly higher than simple prompt-based chats. In fact, consumption can reach 3,500 times the number of tokens used by a series of prompts with ChatGPT.

A token represents the basic unit of information processed by an AI model. It can be part of a word, a whole word, or even a punctuation mark, depending on how the model segments the data.

Cost Variability Difficult to Anticipate

While one might expect agents to consume more tokens, the study reveals even more alarming aspects. Two different models can show very divergent token costs for the same task. Moreover, the same model can have variable costs for each execution of an identical problem, using up to twice as many tokens from one instance to the next.

The most concerning aspect is that these variations cannot be predicted. According to Bai and his team, agents are unable to reliably estimate the number of tokens they will consume for a given task.

Challenges in Cost Forecasting

Agentic tasks are described as "uniquely costly," and the increase in the number of tokens used does not necessarily improve outcomes. "Simply increasing token usage may not lead to better execution performance," they clarify, adding that AI models systematically underestimate the number of tokens required.

The rising costs and uncertainty regarding success are not accounted for in the current price lists from OpenAI and other providers. Users must therefore impose strict limits on the use of agentic computers, which could lead agents to stop before completing tasks.

Estimating Token Costs: A Technical Challenge

To analyze costs, Bai and his team used the open-source agentic AI framework OpenHands, developed by researchers from the University of Illinois at Urbana-Champaign and other institutions. They used OpenHands to build agents, which they then tested on the open-source coding benchmark SWE-Bench, based on real GitHub problems.

The study allowed for the identification of the relative strengths of the models. OpenAI's ChatGPT 5 and 5.2 models showed high accuracy at low cost, although they were not the most precise. Anthropic's Claude Sonnet-4.5 model achieved the highest accuracy but at higher token costs. Google's Gemini-3-Pro model fell somewhere in between.

Conclusion: Towards Better Cost Management

The results of the study confirm anecdotal experiences with coding agents, where costs accumulate without a clear forecast of the total. The authors do not propose concrete solutions but suggest that even if agents cannot predict the number of tokens, they could provide high-level estimates, thus offering early budget alerts before launching costly executions.

It is crucial for users to carefully consider what can be controlled at the input level, as input tokens represent the largest cost element. Cost transparency issues must be addressed by the industry as a whole, as users need to be able to plan their software investments.