Anthropic's Mythos: A Breakthrough That Surpasses GPT-5.5

⚡

Key Takeaways

1The Mythos model from Anthropic, launched a month ago, shows rapid and surprising progress, surpassing GPT-5.5.

2The UK AI Security Institute found that Mythos has solved complex cyber tasks for the first time.

3The capabilities of Mythos raise questions about the speed of evolution of AI models and their implications for cybersecurity.

💡Why it matters — The rapid evolution of Mythos could transform software vulnerability detection capabilities, impacting global cybersecurity.

Mythos by Anthropic: Performance Beyond Expectations

The Mythos artificial intelligence model from Anthropic, unveiled just a month ago, has already achieved significant milestones in performance testing. External researchers have observed that this model has accomplished several firsts during evaluations, demonstrating a faster-than-expected improvement.

According to a report from the UK AI Safety Institute (AISI), a recent version of Mythos has surpassed not only its previous performance but also that of OpenAI's GPT-5.5 model. This advancement was noted just a month after the initial launch of Mythos.

Unprecedented Cybersecurity Skills

The Claude Mythos model, deemed too powerful for widespread release by Anthropic, has exhibited new skills. In a blog post published on Wednesday, the AISI revealed that Mythos Preview successfully completed two series of cybersecurity tests, notably solving the "The Last Ones" series in 6 out of 10 attempts and "Cooling Tower," which had never been solved before, in 3 out of 10 attempts.

This performance marks the first time a model has succeeded in the second series of cybersecurity tests, highlighting a significant advancement compared to previous models.

Rapid Evolution and Its Implications

The AISI noted that AI models are rapidly advancing in their ability to handle cybersecurity tasks, which could have significant implications for cybersecurity. By February 2026, it was estimated that the duration of cybersecurity tasks completed by AI models would double every 4.7 months since late 2024, an acceleration from a previous estimate of 8 months in November 2025.

However, it remains uncertain whether this trend will continue. The results of Mythos and GPT-5.5 could be notable exceptions to the general trend. The AISI also emphasized that current tests, limited to 2.5 million tokens, underestimate the actual capabilities of the models.

Testing Limitations and Future Perspectives

The tests showed that Mythos Preview and GPT-5.5 have significant margins of error due to near 100% success rates on the longest tasks, even with the token limit. This constraint makes it difficult to assess the reliability of the models on longer tasks.

The blog authors added that a limit of 2.5 million tokens is relatively low. In experiments using up to 100 million tokens, it was found that model performance could continue to improve beyond this threshold, particularly for recent models that benefit from higher token limits.

The Glasswing Project and Vulnerability Detection

When Anthropic announced Mythos Preview and the Glasswing Project last month, it marked a significant milestone. This project is a cybersecurity testing alliance formed with competing tech companies and AI labs, to which Anthropic has granted limited access to Mythos. This collaboration aims to evaluate Mythos's capabilities in a secure framework.

Mythos's ability to detect software vulnerabilities is particularly noteworthy. The AISI highlighted that this capability could transform how cyber threats are managed, making systems more resilient against potential attacks.

Unknowns and Upcoming Challenges

Despite these advancements, several unknowns remain. The tests could not account for all variables, and the current token limits underestimate what cutting-edge models can truly achieve. This means that success rates could be much higher without these restrictions, making time horizons difficult to calculate.

In conclusion, while Mythos and GPT-5.5 have shown impressive performances, it remains to be seen whether these results represent a sustainable trend or merely notable exceptions in the evolution of AI models.