ANYbotics and Physical AI: Two Visions to Dominate the Market

Le brief IA que les pros lisent chaque soir
Les 7 actus IA du jour, décryptées en 5 min. Gratuit.
Inclus dès l'inscription : notre sélection des meilleurs guides & comparatifs IA.
Choisis ton rythme
Gratuit · Pas de spam · Désabonnement en 1 clic
An Immersion in the World of Physical AI
In recent years, I have been deeply involved in the commercialization of autonomous robots, first within the Swiss scale-up ANYbotics, and then collaborating with various companies in the physical AI sector to develop go-to-market strategies. This immersion has allowed me to observe a recurring pattern: while the common goal is to enhance the capabilities of robots, the methods to achieve this diverge significantly.
Two main approaches emerge in this industry. The first, widely adopted, is based on advancements in machine learning and emphasizes data exploitation. The second, less common, focuses on architecture from the outset, relying on real-world deployments and applied research. Although both methods have their merits and are supported by competent teams and significant funding, they rest on distinct assumptions, which influences their commercial impact.
The Data-Driven Approach: A Model Inspired by Successes in AI
The data-driven approach draws inspiration from breakthroughs in language and computer vision. This strategy is built on three pillars: massive data collection, training increasingly large models, and applying scaling laws to enhance performance.
This method is currently the most prevalent in the field of physical AI, largely due to the success of large language models. Professionals adopting this strategy often come from backgrounds specialized in language processing and computer vision, making its application to robotics logical and appealing.
Well-funded companies such as Physical Intelligence, Generalist, 1X founded in Norway, and Flexion based in Zurich are betting on this approach. Their data collection techniques vary, ranging from the use of video models to wearable devices, large-scale simulation, and teleoperation.
The focus is generally on specific and localized skills, such as object manipulation on a table or locomotion. The standard method involves training a model on limited data and then refining it for a particular skill. These steps depend on structured data and are tailored to benchmarks, demonstrations, and academic publications.
However, a major challenge lies in the fact that physical data is not as abundant as text or images, and the requirements for safety and reliability are stricter in the physical world. Unlike a screen, there is no "undo" button when a force is applied in the real world, necessitating a cautious approach.
Models can only be reliable in environments similar to those encountered during their training, a concept known as "in distribution." Thus, this approach often begins in controlled laboratory conditions, postponing the complexities of the real world to a later stage.
The bet is that the capabilities of the models will expand over time, that skills will become more general, and that mechanisms will be developed to handle edge cases, gradually broadening what is considered "in distribution." This also depends on future research advancements that would make real-world deployment possible. Currently, hallucinations and the black-box reasoning of end-to-end models limit their use in complex and safety-critical environments.
The Architecture-Centric Approach: Embracing the Complexity of the Real World
In contrast to the data-driven approach, the architecture-centric approach rests on a different set of assumptions, influenced by applied research in field robotics and real-world deployments. This method accepts the complexity of the real world from the outset and adapts the model architecture to fit this reality, rather than modifying the world to suit the model.
Teams adopting this approach are fewer in number and generally come from backgrounds specialized in field robotics. For example, FieldAI, founded by veterans from NASA's Jet Propulsion Laboratory, Google DeepMind, and DARPA robotics challenges, combines Bayesian methods with modern machine learning in its Field Foundation Models. These models have been deployed at hundreds of sites across Europe, Asia, and North America.
Waymo has also embraced this architectural bet in the field of autonomous driving. It has sought pure end-to-end models, but its deployed Waymo Foundation Model retains a structured design with interpretable components and Bayesian processing of uncertainty, allowing it to verify and validate its decisions over millions of miles without a driver on public roads.
The central idea is that the physical world requires a fundamentally different approach than the digital realm, and that mere data accumulation is insufficient. Large language models, although trained on trillions of tokens, continue to hallucinate. Unlike a chatbot, a robot that hallucinates can cause real physical harm.
Integrating AI into the physical world is a complex architectural challenge, requiring deep research and mathematical rigor. The system must be grounded in physics, quantify its uncertainty, and act accordingly, gathering more information when necessary and retreating when the risk is too high. For example, a robot on a construction site will slow down and wait for dust to settle before proceeding, mimicking the cautious behavior of a human.
The result is a data-efficient and resilient system, capable of understanding how the world works and recognizing its own limitations. It can adapt to dynamic and unforeseen conditions it has never encountered during training. Moreover, it can be deployed in a new environment without prior information or supporting infrastructure, and begin working like a new employee from day one.
This approach offers clients a level of operational intelligence that enables robots to perform complex tasks from start to finish, sequencing numerous actions in order and coordinating multiple robots working together. Individual skills remain crucial, but they are designed with the constraints and imperfections of real-world operations in mind from the outset, rather than being adapted afterward.
Business Implications for Robotics Companies
My intuition is that operational data is accumulating in a way that synthetic benchmarks and controlled demonstrations cannot match, and that the organizational know-how developed to operate robots in environments not designed for them is difficult to replicate in a lab.
An architecture-centric system, capable of managing uncertainty and adapting to unforeseen events, can be implemented immediately in real-world conditions. This is why teams adopting this approach tend to achieve greater commercial success.
This method also meets clients' needs for operational intelligence, adding value to their existing processes. Each deployment generates operational data, a rare and valuable resource in the field of physical AI.
It is paradoxical that the approach requiring the least data to start ultimately accumulates the greatest amount of high-quality data, reflecting the diversity of conditions and edge cases that only appear in the field. What seems like an architectural advantage translates into a deployment advantage and a self-reinforcing data cycle.
Physical AI is often described as a race. The question of who the final winners will be remains open, and I do not claim to know how this will resolve. But I do know which direction the evidence on the ground points.
Brief IA — L'actualité IA en français
L'essentiel de l'actualité de l'intelligence artificielle, décrypté et expliqué chaque jour.