ChatGPT 5.5 Pro: A Mathematical Breakthrough Shaking Up Researchers

⚡

Key Takeaways

1Timothy Gowers used ChatGPT 5.5 Pro to solve complex mathematical problems in less than two hours, without human assistance.

2The OpenAI model improved upon existing mathematical bounds, producing results worthy of a doctoral level.

3Isaac Rajagopal praised the originality of the ideas generated by the AI, which surpassed his own work in number theory.

💡Why it matters — AI is redefining the role of researchers in mathematics, pushing the boundaries of innovation beyond traditional human capabilities.

ChatGPT 5.5 Pro: An Unprecedented Advancement in Mathematics

British mathematician Timothy Gowers, a Fields Medal laureate, has recently highlighted the impressive capabilities of OpenAI's ChatGPT 5.5 Pro model. Gowers, who holds the chair of combinatorics at the Collège de France and is a member of Trinity College, Cambridge, used this model to tackle open problems in number theory. In less than two hours, the artificial intelligence produced complete scientific papers with no human intervention. According to Gowers, the quality of the results reaches a doctoral level, with significant improvements over existing mathematical bounds.

Isaac Rajagopal, a young researcher involved in this initiative, described the model's key idea as "completely original," an achievement that a human mathematician would be proud of after weeks of contemplation. Gowers emphasized that his own mathematical contribution was zero, as the model did all the work in record time. "I didn't even do anything intelligent with the prompts," he writes on his blog.

Rapid Resolution of Complex Problems

Gowers provided ChatGPT 5.5 Pro with problems from a paper by number theorist Mel Nathanson. This paper examines the possible sizes of certain sets of integer sums and the efficiency of constructing sets with prescribed properties. Nathanson had proven an exponential bound for one of the problems and wondered if it could be improved. The AI solved this open mathematical problem in just 17 minutes and 5 seconds, improving an exponential bound to a quadratic bound. The central idea: the model replaced a component in Nathanson's proof with a more efficient variant, well-known in combinatorics, but whose application to this particular problem was not obvious.

When asked, ChatGPT rewrote the argument in LaTeX preprint form in 2 minutes and 23 seconds. Gowers verified its correctness and then asked the model to solve a related variant, which it did without issue. Both results are available as preprints.

An Evolving Human-Machine Collaboration

A generalized version of the problem proved to be much more challenging. Here, there was prior work by Isaac Rajagopal, a student at MIT who had proven an exponential dependency. Gowers provided ChatGPT with Rajagopal's paper and requested an improvement. What followed was a gradual escalation: after 16 minutes and 41 seconds, the model provided an initial improvement. Rajagopal deemed this step correct but labeled it a "routine modification" of his own work. Gowers then, as he puts it, became "greedy" and asked ChatGPT to try to achieve a much stronger bound.

After 13 minutes and 33 seconds, the model expressed optimism but indicated that two technical statements still needed verification. 9 minutes and 12 seconds later, the verification was complete. The final preprint was ready in 31 minutes and 40 seconds. The model had improved the bound from exponential to polynomial. According to Gowers, Rajagopal stated that the results were "almost certainly correct," both at the level of individual proof steps and the underlying ideas.

"The Kind of Idea I Would Be Very Proud to Have Had After a Week or Two of Reflection"

Rajagopal describes the model's key idea as "rather ingenious." It found a counterintuitive way to compress certain algebraic structures so that they fit into a much smaller range of numbers without losing their crucial combinatorial properties. "It's the kind of idea I would be very proud to have had after a week or two of reflection, and it took ChatGPT less than an hour to find and prove it, using methods similar to those in my own proof," writes Rajagopal. As far as he can tell, the idea was "completely original."

Gowers places the result at the level of "a perfectly reasonable chapter in a PhD in combinatorics," stating that it is not an "incredible result," as it heavily relies on Rajagopal's ideas, but it is "definitely a non-trivial extension." For a PhD student, it would have taken considerable time to go through Rajagopal's paper, identify weaknesses, and adapt techniques, Gowers says.

He draws far-reaching conclusions: "The bar for contributing to mathematics will now be to prove something that LLMs cannot prove, rather than simply proving something that no one has proven so far and that someone finds at least interesting." He clarifies, however, that PhD students could use LLMs as a tool. The real task will then be to create something in collaboration with LLMs that the models cannot do alone.

Gowers proposes a thought experiment: "Suppose a mathematician solves a major problem by having a long exchange with an LLM in which the mathematician plays a helpful guiding role but where the LLM did all the technical work and had the main ideas. Would we consider that a great achievement for the mathematician? I don't think so."

However, he sees value in the struggle to do mathematics oneself. Those who have solved difficult problems on their own gain insights into the problem-solving process that simply cannot be obtained by reading. "Just as very good coders are better at intuitive coding than less skilled coders," writes Gowers. His prediction: anyone starting a PhD today and finishing as early as 2029 will see mathematical research "transformed beyond recognition" by then.

This echoes the vision of prominent mathematician Terence Tao, who described "industrial-scale mathematics" powered by AI tools, where large AI-supported teams conduct broad research rather than solitary individuals working on narrow problems for years.

At the time, however, Tao compared AI models to "mediocre but not completely incompetent" research assistants. Gowers' experience with ChatGPT 5.5 Pro suggests that this assessment may already be outdated. Tao's latest comments have also been much more positive.

Generative AI Continues to Delve Deeper into Mathematics

An early example of AI in mathematical research was the use of GPT-5 as a research tool. OpenAI researchers claimed that a GPT model had "found" the solution to an Erdős problem. In reality, the AI had simply rediscovered an existing solution in the literature and had not developed its own proof.

A clear leap occurred when GPT-5.2 Pro solved Erdős problem number 728 "more or less autonomously," according to Tao. No corresponding solution could be found in the existing literature. Then, GPT-5.4 Pro went further, solving a long-standing open Erdős problem.

Progress has also been observed in other areas. In December 2025, a physicist published a paper whose central idea came from GPT-5. The author expects hybrid human-AI collaborations to become the norm in mathematics, physics, and other formal sciences before long. As language models become more precise, they could increasingly function as autonomous research agents.

Why Jumping to Conclusions is Risky

Google DeepMind has experienced both breakthroughs and concerning failure rates with its AI agent Aletheia. The system, built on Gemini Deep Think, independently wrote a mathematical paper, refuted a decades-old hypothesis, and discovered an error in a cryptography paper. However, when researchers systematically tested it on 700 open mathematical problems, only 6.5% of its responses proved usable.

Tao has consistently made a similar point. Erdős problems vary in difficulty by "several orders of magnitude," he notes. Just because a problem is 50 years old and an AI solves it does not mean it has withstood all human efforts for half a century. Often, no one has seriously tackled it.