OpenAI Unveils GPT-5.6: A Breakthrough That Surpasses Mythos 5?

⚡

Key Takeaways

1On June 26, OpenAI officially introduced GPT-5.6, with models named Sol, Terra, and Luna.

2GPT-5.6 Sol stands out for its advancements in software development, cybersecurity, and quantitative biology.

3Benchmarks show that Sol Ultra achieves 91.9% on TerminalBench 2.1, surpassing Anthropic's Mythos 5.

💡Why it matters — GPT-5.6 could redefine performance and security standards in AI, impacting various technology sectors.

OpenAI Unveils GPT-5.6: A Breakthrough That Surpasses Mythos 5?

On Friday, June 26, OpenAI unveiled its latest innovation, GPT-5.6. This highly anticipated launch confirms the rumors circulating in recent weeks and showcases a range of models that impress with their capabilities.

The GPT-5.6 series consists of three distinct models, each named after our solar system: Sol, Terra, and Luna. These models are designed to excel in several key areas such as reasoning, autonomous execution of complex tasks, interface creation, and energy efficiency.

Additionally, OpenAI introduces an innovative mode called “Ultra.” This mode allows for task distribution among multiple sub-agents, thereby optimizing performance for the most demanding missions.

Currently, access to the GPT-5.6 range is limited to a select few partners, validated by the U.S. government.

What Do GPT-5.6 Promise?

The GPT-5.6 Sol model is presented by OpenAI as the most powerful ever created by the company. It stands out for its significant advancements in several areas, including:

software development
complex workflows
quantitative biology
cybersecurity

OpenAI emphasizes that Sol is also the most secure model they have developed to date. The company has strengthened real-time security mechanisms to reduce the risks of cyberattacks and malicious uses.

To ensure this security, OpenAI invested several weeks in intensive testing. These tests included hacking exercises conducted by specialized teams and over 700,000 hours of automated testing, equivalent to the continuous use of an A100 GPU during this period.

As for Terra, it is designed to offer a balance between performance and cost, while Luna is optimized for speed and cost reduction.

What Do the Benchmarks Say?

GPT 5.6 Results on TerminalBench 2.1

The benchmarks released by OpenAI show that Sol excels in several evaluations. On TerminalBench 2.1, which assesses performance in complex command-line tasks, GPT-5.6 Sol achieves an impressive score of 88.8%.

The Sol Ultra model goes even further with a score of 91.9%. In comparison, Claude Mythos 5, the most advanced model from Anthropic, scores 88%. On the same benchmark, Terra scores 84.3%, surpassing GPT-5.5, which stands at 83.4%, while Luna achieves 82.5%.

The improvements brought by GPT-5.6 are not limited to programming. On GeneBench v1, which focuses on workflows in genomics and quantitative biology, OpenAI claims that GPT-5.6 Sol outperforms GPT-5.5 while consuming fewer tokens. A similar trend is observed on ExploitBench, a benchmark dedicated to cybersecurity, where Sol competes with Mythos Preview while generating about three times fewer tokens.

GPT 5.6 Results on ExploitBench

On its part, GPT-5.6 Terra offers performance comparable to GPT-5.5 while halving its cost.

On GeneBench v1, GPT-5.6 Luna achieves a score of about 14 to 15% for a cost of less than one dollar via the API. In comparison, GPT-5.5 is around 23% for nearly $1.2.

Terra reaches about 28% for a cost close to $1.7, while Sol achieves around 31% for a cost of approximately $1.9.

OpenAI Unveils GPT-5.6: A Breakthrough That Surpasses Mythos 5?

Le brief IA que les pros lisent chaque soir

OpenAI Unveils GPT-5.6: A Breakthrough That Surpasses Mythos 5?

What Do GPT-5.6 Promise?

What Do the Benchmarks Say?

GPT 5.6 Results on TerminalBench 2.1

GPT 5.6 Results on ExploitBench

Brief IA — L'actualité IA en français