Google Gemini 3.5 Flash: An AI Model That Challenges the Giants

⚡

Key Takeaways

1Google introduces Gemini 3.5 Flash at I/O 2026, a high-performance and affordable AI model.

2Gemini 3.5 Flash outperforms Gemini 3.1 Pro with 76.2% on Terminal-Bench 2.1.

3The model offers a competitive price of $1.50 per million input tokens.

💡Why it matters — Google is redefining the AI market with a high-performance, low-cost model, impacting the competition.

Google Unveils Gemini 3.5 Flash at I/O 2026

At the I/O 2026 event, Google unveiled its latest artificial intelligence model, Gemini 3.5 Flash. This model, touted as a cost-effective and high-performance alternative, is designed to outperform existing models in agentic tasks. With a price reduced to one-third that of its main competitors, Gemini 3.5 Flash comes with an integrated agent platform and extensive distribution, promising to disrupt the market.

Performance of Gemini 3.5 Flash

Google's nomenclature for its models follows a well-established hierarchy: Pro for raw power, Flash for speed at a lower cost, and Flash-Lite for the lowest cost. However, the launch of Gemini 3.5 Flash on May 19 during the I/O 2026 keynote challenges this logic. In the coding benchmark test Terminal-Bench 2.1, the model achieved a score of 76.2%, surpassing the Gemini 3.1 Pro, which scored 70.3% at its launch in February. Furthermore, on MCP Atlas, which evaluates the ability to orchestrate external tools, Gemini 3.5 Flash scored 83.6%. On the composite index of Artificial Analysis, it recorded a score of 55, coming close to Claude Opus 4.7 with 57 and GPT-5.5 with 60. The gap between these models is narrowing to the point where the differences become almost imperceptible.

A Redefined Hierarchy

The benchmark results tell two distinct stories depending on the angle of analysis. In tests measuring pure knowledge and abstract reasoning, the traditional hierarchy remains intact. For instance, on ARC-AGI-2, which evaluates novel logical puzzles, Gemini 3.5 Flash scored 72.1%, while 3.1 Pro achieved 77.1% and GPT-5.5 scored 84.6%. Similarly, on Humanity's Last Exam, Flash scored 40.2%, compared to 44.4% for Pro. However, in tests simulating real-world tasks, the trend reverses. On Terminal-Bench 2.1, MCP Atlas, Finance Agent v2, GDPval-AA (with 1,656 points, a benchmark for evaluating concrete tasks), and OSWorld-Verified (with 78.4%), Gemini 3.5 Flash leads or closely follows top models that cost three to five times more. In multimodal mode, it achieved a score of 83.6% on MMMU-Pro, the highest ever recorded on this test, ahead of GPT-5.5 (81.2%) and Claude Opus 4.7 (75.2%).

A Powerful Generation Model

Comparing models requires nuance that Google does not emphasize in its official communications. Gemini 3.5 Flash belongs to the Flash range, designed for speed and reduced cost. In contrast, Claude Opus 4.7 and GPT-5.5 are the flagship models of the Pro range at their respective publishers. The true comparison at an equivalent range will be with Gemini 3.5 Pro, set to launch next month. Comparing 3.5 Flash to heavyweight models is akin to measuring a Flash-Lite against a Pro: it’s not cheating, but rather a demonstration of efficiency, which Google strives to prove.

Pricing and Positioning

The pricing of Gemini 3.5 Flash confirms its strategic positioning. The model is offered at $1.50 per million tokens for input and $9 for output, with a 90% discount on cached tokens. In comparison, Claude Opus 4.7 is priced at $5 for input and $25 for output, while GPT-5.5 is at $5 for input and $30 for output. For a company deploying autonomous agents handling millions of requests, the cost-effectiveness becomes a major strategic argument rather than just a minor accounting detail.

The Real Launch: A Complete Package

The announcement of the model alone would have been enough to fuel technical discussions for several days. However, Google chose an approach that neither OpenAI nor Anthropic has adopted at this scale: launching the model, the agent platform, and public distribution simultaneously. Antigravity 2.0, Google's agent development environment, runs natively on 3.5 Flash. It allows for the orchestration of multiple agents in parallel, management of custom sub-agents, background task scheduling, and offers a CLI for users who prefer the terminal. Gemini Spark, a personal agent operating 24/7, is powered by 3.5 Flash and runs on dedicated virtual machines in Google Cloud. It integrates seamlessly with Gmail, Docs, Calendar, and the rest of Workspace. Third-party extensions via MCP are expected this summer.

Ambition and Scope

Google's installed base illustrates the extent of its ambition. The Gemini app claims 900 million monthly active users, while AI Mode in Google Search exceeds one billion users. Gemini 3.5 Flash becomes the default model in these two products starting today. Unlike its competitors, Google integrates its models into a much broader range of products. When Anthropic launches a model, it powers Claude, its API, and a few selected partners. When OpenAI launches a model, it powers ChatGPT, Codex, and its API. In contrast, when Google launches a model, it powers Search, the Gemini app, Workspace, Android, YouTube, and an agent development platform. The difference is not just qualitative; it is structural.

With the simultaneous launch of Gemini Omni Flash for multimodal video generation and the “Neural Expressive” redesign of the app interface, Google is transforming I/O 2026 into a showcase of vertical integration.

Conclusion

The question that 3.5 Flash poses to competitors is not “who has the best score on ARC-AGI-2,” but rather “who can offer a near-state-of-the-art model capable of processing 280 tokens per second, at one-third the price, in a billion daily search sessions, while providing the accompanying agent platform.” For now, the answer seems to be Google. And the 3.5 Pro model isn't even available yet.