Claude Sonnet 5: Anthropic Challenges Opus 4.8 with Autonomy

⚡

Key Takeaways

1Claude Sonnet 5, launched by Anthropic, stands out for its autonomy and reduced cost, replacing Sonnet 4.6.

2Despite its performance, Sonnet 5 consumes more tokens, which can increase costs for complex tasks.

3Anthropic has limited Sonnet 5's capabilities in cybersecurity to minimize the risks of misuse.

💡Why it matters — Claude Sonnet 5 could transform everyday AI usage, but its adoption requires a precise assessment of costs and capabilities.

Claude Sonnet 5: A New Model from Anthropic Promising Autonomy and Efficiency

Anthropic's latest model, Claude Sonnet 5, was unveiled on June 30 and is presented as a significant advancement in long-term reasoning. Designed to replace its predecessor Sonnet 4.6, this model aims to be more autonomous, more agentic, and above all, more economical. This evolution comes at a time when Anthropic seeks to demonstrate the safety of its technologies, particularly following the Fable 5 incident with the White House. The question arises whether Claude Sonnet 5 can truly establish itself as the model of choice for everyday applications.

Enhanced Autonomy for Varied Tasks

Claude Sonnet 5 has been integrated as the default model across all of Anthropic's platforms, such as Claude Code, Cowork, and Claude.ai. It is designed to efficiently handle tasks ranging from simple to moderately complex. However, it has not yet completely dethroned Claude Opus 4.8. For instance, on the SWE-bench Pro benchmark, which evaluates code in agentic mode, Sonnet 5 scores 63.2%, while Opus achieves 69.2%. The gap narrows on Terminal-Bench 2.1, where Sonnet 5 shows 80.4% compared to 82.7% for Opus, and on OSWorld-Verified, which measures autonomy on a computer, with Sonnet at 81.2% against 83.4% for Opus.

Despite these differences, Sonnet 5 is close enough to Opus's performance to cover the majority of daily tasks. It even stands out in certain long-context reasoning tasks, albeit with some caveats.

Competitive Pricing but Hidden Costs

Anthropic offers Sonnet 5 at an initially attractive rate of $2 per million tokens for input and $10 for output, valid until August 31, 2026. After that, prices will rise to $3 for input and $15 for output, aligning Sonnet 5 with the usual pricing of Sonnet models. On paper, Sonnet 5 is therefore more affordable than Opus 4.8 while offering comparable performance across several benchmarks. However, the reality is more nuanced.

Challenges with the New Tokenizer

Although Sonnet 5 appears promising, it has notable drawbacks. Its new tokenizer leads to increased token consumption, up to 30% more than Sonnet 4.6. This means that, despite a capacity of one million tokens, Sonnet 5 can process less text, documents, or code than its predecessor. This increase in consumption can inflate costs, especially for complex tasks requiring deep reasoning. In some cases, the total cost of a request could even exceed that of Claude Opus 4.8, despite a lower unit price.

Additionally, Anthropic has deliberately restricted certain capabilities of Sonnet 5 in terms of cybersecurity to limit the risks of malicious use. As a result, the model performs less effectively in cyber tasks and may even be outperformed by Sonnet 4.6 in this area.

The Time to Decide: To Migrate or Not to Sonnet 5?

For most users, migrating to Claude Sonnet 5 seems to be a wise decision, particularly for daily tasks in Claude Code, where its agentic capabilities and autonomy make it a powerful tool. However, for tasks requiring greater depth of reasoning, Opus 4.8 remains indispensable. Beyond code development, Sonnet 5 also excels in knowledge work tasks, especially in Cowork.

For developers considering migration, several checks are necessary:

Compare requests with Sonnet 4.6 and Sonnet 5 via the token counting API to assess the increase in consumption.
Reevaluate max_tokens limits to ensure they cover both internal reasoning and visible responses.
Note that Sonnet 5 activates adaptive reasoning by default; old fixed budgets are no longer accepted.
Remove the temperature, top_p, and top_k variables from requests, as Anthropic recommends managing tone and variability directly within the system prompt.

Anthropic is proceeding cautiously with Sonnet 5, integrating numerous safety measures and prioritizing a more adaptive and autonomous intelligence. While it may not be a revolution, Sonnet 5 represents a significant evolution, offering a balance between innovation and safety for everyday users.