Anthropic in Crisis: Power Shortage for Claude

⚡

Key Takeaways

1Anthropic, in full expansion, is struggling to provide enough computing power for its AI tools, particularly Claude.

2The startup relies on the data centers of Google and Amazon, creating competition for resources with its own investors.

3A smart rationing strategy has been implemented to manage demand spikes, but this could affect service stability.

💡Why it matters — Anthropic's ability to meet demand is crucial for its competitiveness against giants like OpenAI.

Anthropic and the Growing Demand for Its AI Tools

Anthropic, a company specializing in artificial intelligence, is facing a significant challenge: meeting the growing demand for its tools, which require considerable computing power. This situation arises from the increasing popularity of its products, particularly Claude, which has captivated many users and distinguished itself in the industry. However, this popularity is putting a strain on the company's computing capabilities.

In the highly competitive AI sector, Claude has become a household name. Users are flocking to Anthropic, drawn by its high-performing tools that have even managed to stand out against the Pentagon. This popularity has prompted OpenAI, a major competitor, to reassess its strategy to better compete with Anthropic in the enterprise market.

The Battle for Computing Resources

Anthropic does not own its own data centers and must rely on those of Google and Amazon, two of its investors. This dependency creates a situation of direct competition for access to resources, as these giants naturally prioritize their own artificial intelligence projects. This leads to a bottleneck, where each AI chip becomes a valuable and contested resource.

Anthropic's systems, such as Claude Code and Cowork, are particularly energy-intensive. They require complex feedback loops that consume between 10 and 100 times more tokens per interaction than other systems, thereby increasing the pressure on available resources.

Anthropic's Adaptation Strategies

In the face of these challenges, Anthropic has already experienced service interruptions, including a major outage on March 2. Since then, users have noticed higher latency times. To mitigate these issues, the company has implemented an intelligent rationing system.

This system imposes stricter usage limits during peak demand periods, even requiring paying subscribers to moderate their requests during working hours. Conversely, Anthropic has increased available quotas at night and on weekends, hoping to shift consumption to less busy times.

This situation poses a significant problem for Anthropic, as more and more companies consider adopting Claude. If the service becomes unstable during demand spikes, it could deter new clients. Additionally, the company's gross margins are under pressure, as subsidizing computing to attract customers becomes less viable if infrastructure costs exceed the revenue generated from subscriptions.

Why Are AI Chips So Hotly Contested?

Generative AI models rely on hardware accelerators, such as GPUs and TPUs, to perform massive calculations quickly, both during training and usage. The availability of these resources depends on several factors, including the number of chips, power supply, cooling, and data center limits.

When multiple companies compete for the same resources from a cloud provider, allocation becomes a limiting factor, even if user demand is surging. This can lead to latency, queues, or even interruptions if the platform can no longer meet quality expectations.

Understanding Token Consumption

A token represents a unit of text used to measure the inputs and outputs of a model, as well as the amount of computation required. The more tokens an interaction uses, the more it monopolizes resources such as chips, memory, and bandwidth, thereby reducing the number of requests that can be processed simultaneously.

Multi-step reasoning or reflection modes significantly increase the number of tokens generated or manipulated before producing a response. On a large scale, this translates into higher computing costs and an increased risk of saturation during peak hours.

The Concept of Intelligent Rationing

Intelligent rationing involves applying usage limits that vary based on server load. This includes stricter quotas when servers are under pressure and more generous quotas when demand decreases.

Technically, this relies on traffic management mechanisms, prioritization based on subscription type, request, or target latency, and sometimes queuing. The goal is to prevent a spike in requests from crashing the entire service by better distributing the available capacity. However, this can make the user experience less predictable, especially if heavy tasks are slowed down or postponed.