Google TPU v8: The New Chip That Threatens NVIDIA's Supremacy

⚡

Key Takeaways

1Google has unveiled its TPU v8 chip, marking a major advancement in the field of AI, with two versions: v8t for training and v8i for inference.

2The TPU v8 promises significant performance gains thanks to a rethought architecture, aiming to reduce reliance on external processors.

3With three times denser SRAM, the v8i improves long context management, halving latency for AI agents.

💡Why it matters — Google aims to compete with NVIDIA by offering a high-performance, integrated alternative for AI, influencing the market for advanced computing technologies.

Google TPU v8: A Major Technical Advancement

Google has recently reached a significant milestone in the field of artificial intelligence with the launch of its new chip, the Google TPU v8. This innovation is designed to dominate the AI market, coming in two distinct versions: the v8t and the v8i. These two variants specifically aim to address the needs of training and inference, respectively. Google is thus looking to set the pace against competing solutions, particularly those from NVIDIA.

The announcement of the Google TPU v8, made on April 22, 2026, marks the beginning of what Google calls the agentic era. This period is characterized by models that do not merely engage in simple verbal interactions but act in concrete ways. The quest for raw power often hits limits due to memory constraints, but this new chip seems to overcome these obstacles.

With a completely rethought architecture, the TPU v8 offers impressive performance gains per watt. Google is betting on total vertical integration to reduce its dependence on external processors, meaning that competition with NVIDIA is not solely about teraflops but also about the overall efficiency of the Boardfly network that connects thousands of chips together.

Google's Strategy Against NVIDIA

Since the Google Cloud Next 2026 event, Google has clearly changed its strategy with the TPU v8 chip. The American company aims to establish itself in the market with this AI technology that, for the first time, splits its architecture. The firm thus offers two distinct chips: the v8t for training and the v8i for inference. This approach aims to overcome the "memory wall" that limits current models, providing a credible alternative to NVIDIA Blackwell GPUs.

Google's strategy is based on the concept of the agentic era, where AI no longer just responds but acts autonomously. For these agents to function without latency, a custom infrastructure is essential. Google is therefore relying on total vertical integration, which gives it a significant competitive advantage in terms of total cost of ownership. This technological autonomy is their best weapon to retain customers within their ecosystem.

Hardware Architecture: v8t and v8i

Google's new chip comes in two variants to optimize each stage of the AI lifecycle. The TPU v8t, nicknamed Sunfish, is dedicated to training, while the v8i, or Zebrafish, is designed for inference. The architecture of the v8t offers a raw power of 12.6 Pflops in FP4 precision, surpassing the 10.1 Pflops of the v8i.

The v8i, on the other hand, is not to be overlooked with its SRAM memory that is three times denser than the previous generation. It integrates 288 GB of HBM3e memory, compared to 216 GB for the training version, allowing it to handle ultra-long contexts without slowdown. Google has also integrated the Boardfly network, capable of connecting up to 1,152 Zebrafish chips together. While the v8t appears more powerful, the v8i is crucial for the interactivity of AI agents, halving latency.

Software and Optimization for Agentic AI

The new chip does not just rely on high-performance components; it is built on a fully integrated software stack. The use of JAX and Pathways allows for managing thousands of chips as a single entity. Without this software layer, managing Superpods would be a major technical challenge. Google has designed this architecture for the agentic era, where programs must make real-time decisions.

The integration of TPUDirect and the RDMA protocol allows data to flow without going through the central processor, thereby reducing latency during communications between nodes. Developers highlight the efficiency of JAX workflows with this new generation, facilitating the scaling of the most resource-intensive models. The Boardfly network works in concert with ICI interconnects to streamline exchanges.

The Google TPU v8t for Massive Training

Google's Sunfish meets the needs for titanic computing with a 3D toroidal architecture that ensures optimal communication between the 9,600 chips of a Superpod. This topology allows for executing the global reduction operations necessary for data parallelism seamlessly. Google announces a performance gain of 2.7x per dollar compared to the Ironwood generation, making this chip the preferred tool for developing future Gemini models.

The strength of this chip lies in its native use of the FP4 format. By shifting from 8 bits to 4 bits for matrix calculations, the TPU doubles the throughput per cycle, reducing the size of working tensors and freeing up memory bandwidth. The SparseCore, a specialized component, efficiently manages irregular memory accesses, avoiding bottlenecks often observed on less specialized processors.

Performance of the Google TPU v8i for Inference

The Zebrafish version is essential for new connected services, eliminating wait times thanks to colossal memory bandwidth. The Google TPU v8i features 288 GB of HBM3e memory, a major improvement over previous generations. This chip targets agents that need to manage very long contexts, halving latency according to official measurements, transforming the user experience on chat tools.

The addition of on-chip SRAM memory, three times denser, helps store temporary data without slowing down the flow. The Boardfly network connects over a thousand processors, allowing Google to overcome the memory wall that often limits traditional processors. While NVIDIA offers powerful solutions, the integration of the Google TPU v8i into the Cloud ecosystem provides a fluidity that is hard to match, making the deployment of models like Gemini more cost-effective for businesses.

Comparison with NVIDIA Blackwell

The power of the Google chip against Blackwell from NVIDIA translates into a duel between versatile raw strength and surgical specialization. Blackwell often showcases higher peak performance, but Google's strength lies in its swarm architecture. Thanks to the Boardfly network and optical switching (OCS), the Google TPU v8 can operate 9,600 chips simultaneously.

The major difference lies in data precision. While NVIDIA excels in flexibility across many formats, the Google TPU v8 focuses on native FP4 to maximize efficiency. However, Blackwell remains the undisputed king of multi-cloud. Choosing Google means accepting a closed ecosystem in exchange for unbeatable performance per watt. A professional's choice often boils down to TCO (Total Cost of Ownership), and producing AI at scale generally costs 30% to 50% less on a TPU infrastructure.

The Future of Google TPU v8 in the Cloud

Google's new chip positions itself against rivals like AWS Trainium or Microsoft solutions. Google is betting on hyperspecialization, as evidenced by the massive agreement with Anthropic, which plans to use over a million chips. This infrastructure will no longer be reserved solely for Google engineers, as the company begins to offer its chips in "bare metal" mode, allowing businesses to install the Google TPU v8 in their own data centers.

The integration of ARM Axion processors in the same racks further reduces overall power consumption, increasing performance per dollar by 80% compared to the previous year. Google's strength also lies in its hybrid cloud, where users can switch between NVIDIA GPUs for flexibility and TPUs for scaling. This ecosystem becomes a Hypercomputer capable of managing any agentic workload effortlessly.