Brief IA

Tencent HY-World 2.0: the open-source AI redefining 3D

💻 Code & Dev·Tom Levy·

Tencent HY-World 2.0: the open-source AI redefining 3D

Tencent HY-World 2.0: the open-source AI redefining 3D
Key Takeaways
1Tencent unveils HY-World 2.0, an open-source AI that generates 3D worlds in 10 minutes, challenging Google and World Labs.
2The model uses 3D Gaussian Splatting to create interactive environments compatible with Unity and Unreal Engine.
3HY-World 2.0 enables a variety of applications, from video game development to embodied robotics, thanks to its advanced simulation capabilities.
💡Why it mattersHY-World 2.0 provides developers with a powerful and open alternative to proprietary solutions, expanding creative and industrial possibilities.
Le brief IA que lisent les pros

Le brief IA que les pros lisent chaque soir

Les 7 actus IA du jour, décryptées en 5 min. Gratuit.

Inclus dès l'inscription : notre sélection des meilleurs guides & comparatifs IA.

Choisis ton rythme

Gratuit · Pas de spam · Désabonnement en 1 clic

📄
Full Analysis

Tencent Hunyuan recently unveiled HY-World 2.0, a major advancement in the field of spatial artificial intelligence. This open-source multi-modal foundational model is capable of generating interactive 3D environments in just 10 minutes. This innovation stands out significantly from closed solutions like Google Genie 3 or Marble from World Labs. HY-World 2.0 integrates 3D Gaussian Splatting (3DGS), a technology that unifies generative creation and physical reconstruction, allowing for native exports to game engines such as Unity and Unreal Engine.

Open-Source Deployment and Technical Performance

Tencent Hunyuan has made HY-World 2.0 available as open-source, including the weight, code, and a detailed technical report. This strategic choice challenges proprietary platforms like Marble from World Labs and the video approaches of Genie 3 from Google DeepMind. The model can generate a complete 3D world in 712 seconds, or about 10 minutes, thanks to the hardware acceleration of NVIDIA H20 GPUs. The generation pipeline reduces the volume of Gaussians by 73.7%, from 5.254 million to 1.383 million, while maintaining high visual fidelity with a PSNR of 25.017.

Four-Step Generation Process

The success of HY-World 2.0 relies on a precise algorithmic assembly line that transforms visual information from 2D into a coherent 3D. The process begins with a panoramic initialization using HY-Pano 2.0, which converts text or an image into a complete visual sphere. This system employs a MMDiT (Multi-Modal Diffusion Transformer) network to learn the transformation to equirectangular projection (ERP), ensuring a 360° environment without visual artifacts through circular padding and pixel blending at the edges.

The next step involves mapping this virtual environment. WorldNav uses MoGe2 to extract a global panoramic point cloud (Ppan), while Qwen3-VL identifies 3D semantic landmarks and SAM3 generates 2D masks. The space is then converted into a NavMesh by Recast Navigation, allowing for the calculation of up to 35 distinct camera trajectories to explore the space virtually without collisions.

To fill in blind spots, WorldStereo 2.0 generates new views along the 35 trajectories calculated. Rather than compressing information in a spatio-temporal manner, the model uses a Keyframe-VAE with purely spatial compression. Consistency among these new images is ensured by the Global-Geometric Memory (GGM) and a spatial stitching module named SSM++, ensuring that each viewpoint adheres to the overall geometry.

The final step, the "World Composition," is orchestrated by WorldMirror 2.0. This network aggregates the generated images to construct a 3D Gaussian Splatting (3DGS) scene. The integration of MaskGaussian allows for probabilistic filtering via Gumbel-Softmax, eliminating unnecessary points. This reconstruction module improves the AUC@30 score (geometric accuracy) from 66.29 to 86.89 on the RealEstate10K benchmark.

Industrial Applications of HY-World 2.0

The availability of the HY-World 2.0 architecture as open-source paves the way for critical applications for developers. In the video game sector, mesh extraction via the marching cubes algorithm enables developers, whether independent or from AAA studios, to instantly import generated levels into game engines. The environment is not just a simple texture: the lightweight polygonal topology allows for immediate integration into Unity or Unreal Engine, with gravity and collision management for player characters.

For robotics, HY-World 2.0 serves as a data generator for Robotics Simulation. Robots can leverage the generated NavMesh and depth maps to learn how to navigate complex spaces, virtually synthesized from simple text prompts.

By relying on real-world databases, the reconstruction capability of WorldMirror 2.0 allows for the generation of accurate digital twins for Environment Mapping. The system robustly manages invalid pixels through a dedicated depth mask prediction head.

Comparison with AI Giants

HY-World 2.0 strategically positions itself against Google Genie 3 and Marble. While Genie 3 is limited to video generation, HY-World 2.0 creates a real 3D mesh and a physical 3DGS rendering, offering tangible geometry and physical collisions. In comparison with Marble, HY-World 2.0 maintains superior fidelity to the initial instructions, without blurring or geometric distortions during viewpoint changes.

Comparative Table of Models

| Technical Feature | Genie 3 (Google DeepMind) | Marble (World Labs) | HY-World 2.0 (Tencent) | |-------------------------|----------------------------|----------------------|-------------------------| | License and Access | Proprietary / Closed | Proprietary / Commercial | Open-Source (Weight & Code) | | Output Format | Interactive Video Stream | Proprietary 3DGS Render | 3DGS, TSDF Mesh, Point Clouds | | Tangible Geometry | No (2D Illusion) | Yes | Yes (NavMesh and Collisions) | | Inference Time | Real-time (Adaptive Resolution) | Not disclosed | 712 s (NVIDIA H20 GPU) | | Expansion Method | Latent Frame Prediction | Unknown | Keyframe-VAE + SSM++ Memory |

With HY-World 2.0, Tencent Hunyuan offers a powerful and accessible alternative, expanding the horizons of digital creation and providing developers with unprecedented flexibility against proprietary solutions.

Brief IA — L'actualité IA en français

L'essentiel de l'actualité de l'intelligence artificielle, décrypté et expliqué chaque jour.