GPT-5.4 and Maria: AI Redefines Medicinal Chemistry

⚡

Key Takeaways

1GPT-5.4 and Maria improved the Chan-Lam reaction for 80% of the tested substrates.

2The average yield of the reactions increased from 16.6% to 25.2%, with significant improvements for sulfonamides.

3This project demonstrates how AI can become a valuable partner for scientists in medicinal chemistry.

💡Why it matters — AI could accelerate drug discovery by optimizing crucial chemical reactions, thereby expanding research possibilities.

A Promising Advancement in Medicinal Chemistry Thanks to AI

In an innovative collaboration, GPT-5.4 from OpenAI, in conjunction with Maria from Molecule.one, has successfully enhanced a complex chemical reaction, the Chan-Lam Coupling, used in medicinal chemistry. This partnership has led to the discovery of an unexpected additive that significantly increased yields for over 80% of the tested substrates.

OpenAI's goal is clear: to use artificial intelligence as a powerful ally for scientists, enabling them to explore new ideas, connect distant concepts, and accelerate scientific discoveries. This project aligns with that vision by applying AI to medicinal chemistry, a field where hypotheses must be validated through laboratory experiments with real molecules.

This project fits into the dynamic of medicinal chemistry, where progress cannot be measured by reasoning alone. A hypothesis must work in the lab with real molecules, instruments, and experimental noise. In collaboration with Molecule.one, we connected GPT-5.4 to Maria—an AI integrated into a high-throughput laboratory for autonomous research—and set it an open goal: to improve one of several classes of important reactions.

The system generated research proposals, designed and executed experiments, analyzed experimental data, and suggested follow-up experiments. Humans played a role in crafting guiding and evaluative prompts, selecting proposals to test, and making limited corrections to experimental plans.

The most promising proposal, OAI-M1-03, focused on a challenging yet useful version of the Chan–Lam coupling, a reaction used by chemists to form carbon-nitrogen bonds. Starting with the goal of improving the Chan–Lam coupling for process chemistry, GPT-5.4 independently identified primary sulfonamides as a challenging but high-value class of substrates and suggested that mild oxidants, including TEMPO, could enhance the reaction.

Over two cycles of experimentation in the Maria laboratory, this idea produced a significant improvement. Under optimized conditions, measured yields increased for 88% of the boronic acids and 83% of the sulfonamides tested. The average yield rose from 16.6% to 25.2%, and the share of reactions exceeding 30% yield increased from 15.6% to 37.5%. Human chemists then repeated representative reactions at the bench scale. These experiments confirmed the results at the microliter scale, showing higher yields for 11 out of 14 substrate pairs, with a doubling in most cases.

Improvements in this area of medicinal chemistry are particularly exciting as synthesis is often a major bottleneck in drug discovery: scientists can only test the molecules they can make or obtain. The group of sulfonamides appears in drugs covering a wide range of therapeutic areas, including anticancer, antimicrobial, and diuretic medications, but the Chan–Lam coupling of primary sulfonamides with boronic acids has historically yielded low returns. Making this form of reaction more reliable could provide medicinal chemists with a broader and more practical means to produce and explore potentially useful molecules.

While this is still an early result, it provides a concrete example of the broader direction we are heading: AI systems that can become valuable partners for scientists throughout the research process. The model reviewed the literature, proposed an unexpected idea, helped design and analyze experiments, and led to a scientific discovery that human chemists could evaluate.

Why the Chemistry Problem is Important

Organic chemistry underpins all small molecule drugs, as well as products in agriculture, electronics, and materials science. A reaction is particularly useful when it can reliably create the same type of chemical bond from many different starting materials. When reactions produce low yields or too many undesirable byproducts, chemists may be forced to abandon promising molecules or spend a lot of time developing another pathway.

The Chan–Lam coupling is useful in medicinal chemistry because it forms carbon-nitrogen bonds, which are common in drugs. However, the reaction does not work equally well for every class of molecules. In particular, the coupling of primary sulfonamides with boronic acids has historically produced low yields. Sulfonamides are an important family of molecules found in drugs used in oncology and infectious diseases. Making this reaction more reliable could provide medicinal chemists with a broader and more practical means to produce and explore potentially useful molecules.

Connecting GPT‑5.4 to Maria AI and Lab

The combined system brought together complementary capabilities. Prompts crafted by scientists working with Maria AI were used with GPT‑5.4 to generate and rank thousands of possible research proposals. Human chemists reviewed the small subset of proposals that received the highest ratings according to the system and selected four for laboratory testing. Maria AI then translated the selected high-level plans into detailed instructions for the lab, executed thousands of high-throughput experiments, analyzed the raw data, and returned structured results to GPT‑5.4.

One of the four selected proposals, OAI-M1-03, suggested using mild oxidants such as TEMPO to improve the performance of the Chan-Lam reaction for synthesizing sulfonamides. Chemists found the suggestion both surprising and interesting. We share the detailed results of OAI-M1-03 in this article and in the accompanying document.

The final research proposal was then used by Maria to generate experimental grids, with slight corrections made by humans. The largest human correction was to avoid dimethyl sulfoxide (DMSO) as a solvent, as chemists feared it might react with the stronger oxidants used in comparison.

The entire process took three months, from the first prompt on March 4 to sharing the results of OAI-M1-03 with independent experts on June 4.

We describe this workflow as quasi-autonomous, not fully autonomous, as human chemists always made important decisions throughout the process. The model proposed key research ideas, while human chemists provided high-level direction and judgment, corrected experimental details, helped prepare laboratory consumables and reagents, and manually repeated key experiments.

What We Found

OAI-M1-03 identified TEMPO as a useful additive for the Chan-Lam coupling of the primary sulfonamides studied here. Under optimized conditions, the reaction improved in two ways: the average yield increased, and more substrate combinations achieved practically useful yields.

Over two cycles, Maria executed a total of 10,080 reactions—more than a chemist could perform at three reactions per day over a decade. This scale is important because results in chemistry can be misleading when tested on only a few examples. A reaction may seem promising on one pair of starting materials but fail on a broader set of molecules. Thousands of reactions allowed for the identification of TEMPO among ten tested oxidants, to see the effect repeat across various combinations, and to find its limitations.

After analyzing the first round of data, the system proposed a second round of more targeted experiments to test follow-up hypotheses. A useful follow-up discovery was that TEMPO could be replaced by a much cheaper analogue, 4-hydroxy-TEMPO, with little loss of performance.

The result was also confirmed beyond the microliter-scale screening format of the Maria laboratory. Human chemists manually reproduced representative reactions at the bench scale and observed an increase in yields for 11 out of 14 substrate pairs; for eight pairs, the increase was greater than twofold. This replication is important because very small-scale experiments can sometimes introduce artifacts that disappear at a larger scale. Bench-scale validation is also common before research is published in a scientific journal.

Limitations

This work demonstrates that a model can make a useful contribution to organic chemistry. It has done more than summarize the literature or suggest a single experiment: it proposed a surprising specific hypothesis and presented it for human review, designed experiments, interpreted experimental data, and devised follow-up experiments.

However, this does not show that AI can act completely autonomously in this field. Human chemists played a crucial role in providing guidance, correcting experimental details, and validating results.

The future of chemical research could be transformed by AI systems that, while requiring human oversight, can accelerate discoveries and open new pathways for scientific innovation.