Claude Mythos: Anthropic's AI that Challenges Human Experts
Le brief IA que les pros lisent chaque soir
Les 7 actus IA du jour, décryptées en 5 min. Gratuit.
Inclus dès l'inscription : notre sélection des meilleurs guides & comparatifs IA.
Choisis ton rythme
Gratuit · Pas de spam · Désabonnement en 1 clic
Claude Mythos, an artificial intelligence developed by Anthropic, continues to capture attention with its exceptional capabilities. While its potential in cybersecurity has been widely publicized, a recent study highlights its impressive skills in bioinformatics. This general-purpose generative AI has demonstrated its ability to solve complex problems that even human experts struggle to decipher.
Anthropic designed a specific benchmark, BioMysteryBench, to evaluate the performance of its AI models in the field of bioinformatics. This benchmark includes 99 complex questions, some of which are considered unsolvable by humans. Among these questions, a typical example is identifying the viral species infecting a patient from RNA sequencing data, a task verifiable by PCR testing but difficult to resolve.
Out of the 99 questions posed, 73 were answered by a panel of human experts. Claude Mythos successfully solved 82.6% of these solvable questions, while Claude Opus 4.7, a publicly accessible version, achieved a rate of 78.9%. Even more impressive, Claude Mythos was able to answer 29.6% of the 23 questions that human experts could not solve, thus surpassing human capabilities. For Claude Opus 4.7, this rate was 27%.
An AI with Vast Knowledge
The explanation for this performance lies in the extensive knowledge base of Claude Mythos. Anthropic emphasizes that the AI integrates information from hundreds of thousands of articles on structural biology and molecular profiles. This wealth of information enables the AI to combine different methods and evidence to reach conclusions, where humans would need to conduct complex meta-analyses.
Anthropic tracked the reasoning of Claude Opus 4.6, which also managed to solve a number of problems unsolvable by experts. According to this analysis, the gap is partly explained by a specific property of artificial intelligence: knowledge. “The vast underlying knowledge base of Claude contains information on structural biology, molecular profiles, and meta-analyses derived from hundreds of thousands of articles,” states the AI lab. Humans, on the other hand, would have had to launch meta-analyses or combine numerous databases.
Moreover, the AI is said to have developed new techniques for problem-solving that scientists could draw inspiration from. In summary, when Claude is unsure of an answer, it combines several methods and integrates elements of evidence from these methods to arrive at a conclusion.
Skills Beyond Cybersecurity
Despite its impressive capabilities, the release of Claude Mythos remains limited. Its skills in cybersecurity, while potentially beneficial, pose risks of malicious exploitation. For example, Mozilla used Mythos to identify 271 security vulnerabilities in Firefox, which they subsequently fixed.
Anthropic plans to deploy Claude Mythos or a similar model once cybersecurity risks are managed. The necessary security measures are currently being tested on Claude Opus 4.7. This caution underscores the delicate balance between leveraging the advanced capabilities of AI and managing the risks it entails.
Promising Scientific Potential
Anthropic's study repositions Claude Mythos not only as a cybersecurity tool but also as a potential major player in the scientific field. Its skills could lead to significant discoveries, provided that security challenges are overcome.
Brief IA — L'actualité IA en français
L'essentiel de l'actualité de l'intelligence artificielle, décrypté et expliqué chaque jour.