Google Gemini 3.5 Flash: Record Speed, But Soaring Costs

⚡

Key Takeaways

1Google's Gemini 3.5 Flash model generates over 280 tokens per second, surpassing its predecessors in speed.

2The operating costs of Gemini 3.5 Flash are 5.5 times higher than those of the previous model, with token prices tripled.

3Despite improvements in agentic tasks, Gemini 3.5 Flash lags behind competitors like GPT-5.5 in programming.

💡Why it matters — The rising costs of Gemini 3.5 Flash raise questions about the economic viability of AI for businesses.

Google Deepmind and the Launch of Gemini 3.5 Flash

Google Deepmind has recently introduced Gemini 3.5 Flash, an artificial intelligence model that stands out for its ability to generate over 280 output tokens per second. This performance makes it the fastest model in its category. However, this speed comes at a cost: operating the Gemini 3.5 Flash is 5.5 times more expensive than its predecessor.

A Significant Increase in Costs

The operating cost of Gemini 3.5 Flash has significantly increased, both in terms of token prices and token consumption. Google now charges $1.50 per million input tokens and $9.00 per million output tokens, compared to $0.50 and $3.00 for the previous model, Gemini 3 Flash. As a result, agent-based tasks consume so many more tokens that total costs are 75% higher than those of Gemini 3.1 Pro, according to Artificial Analysis.

Performance and Limitations

Gemini 3.5 Flash shows significant improvements in agentic and multimodal tasks. It scores 55 on the Artificial Analysis Intelligence Index, nine points higher than Gemini 3 Flash, placing it ahead of Grok 4.3 and Claude Sonnet 4.6. On the AA Omniscience, which evaluates knowledge accuracy and hallucination tendencies, the model improves by 11 points, with a hallucination rate of 61%, a decrease of 31 points compared to its predecessor.

Booming Agentic Tasks

Historically, agentic tasks have been a weak point for the Gemini series. The 3.5 Flash shows considerable improvement in this area. On the GDPval-AA, which tests real agent tasks with web and shell access, it achieves an Elo score of 1,656, a significant jump from Gemini 3 Flash (1,204) and Gemini 3.1 Pro (1,314). However, this increased performance requires an average of 49 interactions per task, more than any other tested model.

Programming, a Persistent Weakness

In the field of programming, where fast and cost-effective models are most sought after, Gemini 3.5 Flash fails to stand out. It scores 45 on the Artificial Analysis Coding Index, well below the Gemini 3.1 Pro Preview (55) and far behind models like GPT-5.5 (59) and GPT-5.4 (57).

Unmatched Speed

Gemini 3.5 Flash reaches a speed of over 280 output tokens per second, about 70% faster than Gemini 3 Flash. It also supports video and audio inputs in addition to text and images, whereas other models like Claude Opus 4.7, Grok 4.3, and GPT-5.5 are limited to image input.

Rising Costs and Unclear ROI

Unless inference costs for the underlying hardware decrease as rapidly as computational power per task increases, the prices of more powerful models will continue to rise. For simpler use cases, older models or smaller options like Gemini 3.1 Flash-Lite will remain available. For businesses, the return on investment for AI is becoming increasingly difficult to assess. Isolated tasks like code generation or translation are easier to measure, but even there, the situation is more complex than it appears.