Claude Mythos: Anthropic's AI that Challenges Human Experts

⚡

Key Takeaways

1Claude Mythos, developed by Anthropic, solves 82.6% of bioinformatics problems that are solvable by humans.

2The AI also answered 29.6% of questions that are unsolvable by experts, demonstrating its superiority.

3The release of Claude Mythos is limited due to its capabilities in cybersecurity, posing potential risks.

💡Why it matters — Claude Mythos could revolutionize scientific research, but its cybersecurity risks hinder its deployment.

Claude Mythos, an artificial intelligence developed by Anthropic, continues to capture attention with its exceptional capabilities. While its potential in cybersecurity has been widely publicized, a recent study highlights its impressive skills in bioinformatics. This general-purpose generative AI has demonstrated its ability to solve complex problems that even human experts struggle to decipher.

Anthropic designed a specific benchmark, BioMysteryBench, to evaluate the performance of its AI models in the field of bioinformatics. This benchmark includes 99 complex questions, some of which are considered unsolvable by humans. Among these questions, a typical example is identifying the viral species infecting a patient from RNA sequencing data, a task verifiable by PCR testing but difficult to resolve.

Out of the 99 questions posed, 73 were answered by a panel of human experts. Claude Mythos successfully solved 82.6% of these solvable questions, while Claude Opus 4.7, a publicly accessible version, achieved a rate of 78.9%. Even more impressive, Claude Mythos was able to answer 29.6% of the 23 questions that human experts could not solve, thus surpassing human capabilities. For Claude Opus 4.7, this rate was 27%.

An AI with Vast Knowledge

The explanation for this performance lies in the extensive knowledge base of Claude Mythos. Anthropic emphasizes that the AI integrates information from hundreds of thousands of articles on structural biology and molecular profiles. This wealth of information enables the AI to combine different methods and evidence to reach conclusions, where humans would need to conduct complex meta-analyses.

Anthropic tracked the reasoning of Claude Opus 4.6, which also managed to solve a number of problems unsolvable by experts. According to this analysis, the gap is partly explained by a specific property of artificial intelligence: knowledge. “The vast underlying knowledge base of Claude contains information on structural biology, molecular profiles, and meta-analyses derived from hundreds of thousands of articles,” states the AI lab. Humans, on the other hand, would have had to launch meta-analyses or combine numerous databases.

Moreover, the AI is said to have developed new techniques for problem-solving that scientists could draw inspiration from. In summary, when Claude is unsure of an answer, it combines several methods and integrates elements of evidence from these methods to arrive at a conclusion.

Skills Beyond Cybersecurity

Despite its impressive capabilities, the release of Claude Mythos remains limited. Its skills in cybersecurity, while potentially beneficial, pose risks of malicious exploitation. For example, Mozilla used Mythos to identify 271 security vulnerabilities in Firefox, which they subsequently fixed.

Anthropic plans to deploy Claude Mythos or a similar model once cybersecurity risks are managed. The necessary security measures are currently being tested on Claude Opus 4.7. This caution underscores the delicate balance between leveraging the advanced capabilities of AI and managing the risks it entails.

Promising Scientific Potential

Anthropic's study repositions Claude Mythos not only as a cybersecurity tool but also as a potential major player in the scientific field. Its skills could lead to significant discoveries, provided that security challenges are overcome.

Claude Mythos: Anthropic's AI that Challenges Human Experts

Le brief IA que les pros lisent chaque soir

An AI with Vast Knowledge

Skills Beyond Cybersecurity

Promising Scientific Potential

Brief IA — L'actualité IA en français