Nvidia and AI: Autonomous Robots Achieve 99% Grasping Mastery

⚡

Key Takeaways

1Nvidia is collaborating with Carnegie Mellon and UC Berkeley to develop autonomous robots using AI coding agents.

2Eight robots have demonstrated gripping capabilities with an impressive success rate of 99%.

3This advancement could transform industrial automation by making robots more adaptive and efficient.

💡Why it matters — These advancements could revolutionize industrial robotics, increasing efficiency and reducing production costs.

Nvidia and AI: Autonomous Robots Master Grasping at 99%

Researchers from Nvidia, Carnegie Mellon University, and UC Berkeley are using AI coding agents to teach robots skilled grasping in the real world. A fleet of eight robots achieves up to 99% success on complex tasks.

Skilled grasping and manipulation remain challenging for robots to learn. Humans must intervene at every step: collecting training data, resetting the scene after each attempt, and adjusting algorithms. This manual burden slows down the entire process. The ENPIRE research project, stemming from Nvidia, Carnegie Mellon University, and UC Berkeley, aims to overcome this bottleneck by entrusting the work to AI coding agents.

The central idea relies on a feedback loop operating on real hardware: resetting the workspace, executing a strategy, checking the result, and improving the next attempt.

The Agent Builds Its Own Evaluation Tools

ENPIRE operates in two phases. In the first phase, the agent sets up a working environment with some human feedback. This includes safety limits, automatic resetting, and automated success monitoring. Instead of having each attempt evaluated by a human, the agent writes its own reward function to distinguish success from failure. It only needs a few minutes of video examples showing successful and failed attempts.

For pin insertion, for example, the agent developed a control system combining visual alignment, gripper height, and estimated force. To close a cable tie, it combined two camera angles to avoid false positives and reduced the reaction time to less than 150 milliseconds. These tools are built once and reused without modifications.

In the second phase, the agent operates entirely autonomously. It reads research papers, forms hypotheses, and directly modifies the training code. It employs methods such as behavior cloning, where the strategy mimics human demonstrations, or reinforcement learning, where the strategy improves through trial and error. The agent chooses the method itself based on real-world success signals.

A Fleet of Robots Coordinating via Git

ENPIRE extends to a full fleet: eight dual-arm YAM robot stations, each with its own hardware, computer, and coding agent. The agents test different hypotheses simultaneously and share results solely through Git, the standard version control tool for software. They adopt successful training recipes from one another and autonomously discard bad ideas. A major discovery made in one station propagates throughout the entire fleet.

According to the study, the agents achieve up to 99% success on demanding tasks such as the Push-T test—where the robot must slide a T-shaped block into a target position and orientation—pin sorting in a box, and cutting a cable tie with a wire cutter. For pin insertion, the strategy converged to 100% faster than a comparable method involving a human.

The scale also pays off in terms of time. For the Push-T test, moving from one to eight agents reduced the total success time from about five hours to two hours. For pin insertion, it decreased from over 90 minutes to around 40 minutes. The researchers tested three current coding agents: Codex with GPT-5.5, Claude Code with Opus 4.7, and Kimi Code with Kimi K2.6. Codex achieved the best results in most cases.

The Real World Remains the Toughest Test

The results also show that the real world is still much more challenging than simulation. During the Push-T test, all three agents solved the task in simulation, but two out of three failed in the real environment. The researchers attribute this to unpredictable and variable conditions such as robot dynamics, friction, and object movement. In the RoboCasa simulation, ENPIRE outperformed both an end-to-end vision-language-action model (GR00T) and a tool-based approach without self-searching (CaP-X).

To measure efficiency, the researchers propose two metrics: Mean Robot Utilization (MRU), which tracks the actual working time the robot spends on tasks, and Mean Token Utilization (MTU), which counts the language model usage per minute. The skills acquired also transfer: the experience from pin insertion helped the agents insert GPUs into a motherboard using robotic arms.

The study is clear about its limitations. The robots and computation are not fully utilized as the agents spend a lot of time reading logs, writing code, and waiting. The more robots in the fleet, the lower the utilization per robot, as agents spend more time summarizing results from others. Token costs also rise faster than performance gains: larger fleets reach their goals more quickly but consume significantly more computational budget to do so. Nevertheless, the researchers view ENPIRE as a practical pathway toward robots capable of self-improvement in the real world.

Nvidia and AI: Autonomous Robots Achieve 99% Grasping Mastery

Le brief IA que les pros lisent chaque soir

Nvidia and AI: Autonomous Robots Master Grasping at 99%

The Agent Builds Its Own Evaluation Tools

A Fleet of Robots Coordinating via Git

The Real World Remains the Toughest Test

Brief IA — L'actualité IA en français