Brief IA

OpenAI and Tech Giants Launch Revolutionary AI Network MRC

💻 Code & Dev·Tom Levy·

OpenAI and Tech Giants Launch Revolutionary AI Network MRC

OpenAI and Tech Giants Launch Revolutionary AI Network MRC
Key Takeaways
1OpenAI partners with AMD, Broadcom, Intel, Microsoft, and NVIDIA to create the MRC network protocol.
2The MRC protocol enhances the speed and resilience of data transfers between GPUs in AI supercomputers.
3MRC is already in use on OpenAI's NVIDIA GB200 supercomputers, optimizing the training of models like ChatGPT.
💡Why it mattersThis innovation could transform the efficiency of AI supercomputers, reducing costs and energy consumption.
Le brief IA que lisent les pros

Le brief IA que les pros lisent chaque soir

Les 7 actus IA du jour, décryptées en 5 min. Gratuit.

Inclus dès l'inscription : notre sélection des meilleurs guides & comparatifs IA.

Choisis ton rythme

Gratuit · Pas de spam · Désabonnement en 1 clic

📄
Full Analysis

A Major Technological Partnership for OpenAI

OpenAI recently announced a collaboration with leaders in the tech industry, including AMD, Broadcom, Intel, Microsoft, and NVIDIA. Together, they have developed a new network protocol called MRC (Multipath Reliable Connection).

Enhancing AI Supercomputer Performance

The MRC protocol is designed to optimize data transfers between GPUs in supercomputers dedicated to artificial intelligence. By distributing data packets across hundreds of simultaneous paths, MRC aims to make these transfers faster, more predictable, and resilient. This approach helps reduce congestion within the network.

In the event of path, link, or switch failures, MRC can detect and circumvent issues in a microsecond. Traditional networks may require several seconds to stabilize after a failure, according to OpenAI.

Advantages of the MRC Protocol

Thanks to its multi-plane design, MRC can connect over 100,000 GPUs with just two levels of Ethernet switches, unlike the three or four levels required by conventional 800 Gb/s networks. This reduces energy consumption, the number of necessary components, and the overall cost of the network.

Deployment and Immediate Impact

The MRC protocol is already operational on OpenAI's NVIDIA GB200 supercomputers, which are used for training advanced models. Among these installations are the Oracle Cloud Infrastructure site in Abilene, Texas, as well as Microsoft's Fairwater supercomputers.

During the training of a recent model for ChatGPT and Codex, OpenAI was able to avoid coordinating the restart of four level 1 switches thanks to MRC, which could have disrupted ongoing work.

Publication and Contributions

The MRC specification was made public today via the Open Compute Project (OCP), accompanied by a research paper detailing its features. In addition to OpenAI, companies AMD, Broadcom, Intel, Microsoft, and NVIDIA all contributed to the development of this innovative protocol.

Brief IA — L'actualité IA en français

L'essentiel de l'actualité de l'intelligence artificielle, décrypté et expliqué chaque jour.