Deepseek Challenges OpenAI with Affordable AI Models

⚡

Key Takeaways

1Deepseek has unveiled the V4-Pro and V4-Flash models, featuring 1.6 trillion parameters, at competitive prices.

2Deepseek's innovative architecture reduces the resources required, allowing for aggressive pricing against OpenAI and Google.

3V4-Pro, the largest open-weight model, outperforms Kimi K2.6 and GLM-5.1 in performance on the GDPval-AA benchmark.

💡Why it matters — Deepseek could disrupt the AI market with high-performing, low-cost models, forcing competitors to rethink their pricing strategies.

Launch of V4-Pro and V4-Flash Models by Deepseek

Deepseek, a Chinese artificial intelligence lab, has recently launched two new models, V4-Pro and V4-Flash, which stand out for their size and efficiency. These models, offered with open weights, reach up to 1.6 trillion parameters and feature a context window of one million tokens. This advancement is made possible by a new architecture that significantly reduces the computational resources required to process long contexts. This allows Deepseek to offer these models at prices well below those of competitors such as OpenAI, Google, and Anthropic.

The models have been trained on a vast corpus of up to 33 trillion tokens. They have been fine-tuned through distillation from internal specialized models, making them particularly suited for agentic tasks. They operate on Nvidia GPUs as well as Huawei's Ascend chips, broadening their hardware compatibility.

Technical Details of the Models

Deepseek has released preliminary versions of V4-Pro and V4-Flash under the MIT license, making them accessible to a wide audience. The V4-Pro model, with its 1.6 trillion parameters, is now the largest open-weight model available, significantly surpassing Kimi K2.6 and GLM-5.1. V4-Flash, on the other hand, has 284 billion parameters. These models are expert mixture models, meaning they utilize an architecture that allows only a portion of the parameters to be activated for each task, thereby optimizing efficiency.

Architectural Innovations

The key innovation of these models lies in a new hybrid attention architecture. This architecture combines token compression with sparse attention, allowing for reduced resource requirements. According to Deepseek's technical report, V4-Pro requires only 27% of the FLOPs and 10% of the KV cache compared to the previous version, V3.2, to process a context of one million tokens. V4-Flash goes even further, reducing these figures to 10% of the FLOPs and 7% of the KV cache.

On the GDPval-AA benchmark from Artificial Analysis, V4-Pro outperforms all open-weight models with a score of 1,554 Elo points, surpassing GLM-5.1 and Kimi K2.6. This represents an increase of approximately 355 Elo points compared to V3.2. However, Deepseek acknowledges that V4-Pro is slightly behind leading models like GPT-5.4 and Gemini-3.1-Pro, lagging by three to six months.

Aggressive Pricing

The efficiency gains of the models explain Deepseek's aggressive pricing strategy. V4-Flash is offered at a rate of $0.14 per million input tokens and $0.28 per million output tokens, making it cheaper than OpenAI's GPT-5.4 Nano. V4-Pro is priced at $1.74 and $3.48, placing it well below Gemini 3.1 Pro, GPT-5.5, and Claude Sonnet 4.6.

Training Based on Massive Data

The Deepseek team remains relatively discreet about the pre-training corpus. V4-Flash was trained on 32 trillion tokens, while V4-Pro utilized 33 trillion. The focus was on multilingual data, carefully selected scientific articles, and agentic data during the intermediate training. Web data was filtered to avoid automatically generated and repetitive content.

The report does not mention specific datasets or licensing sources. Suspicions that Deepseek may be distilling directly from GPT or Claude are not confirmed in the report.

Distillation and Optimization

Distillation plays a central role in the subsequent training of the models. Deepseek has completely replaced its mixed reinforcement learning phase with policy distillation. According to the report, the lab first trains more than ten internal specialized models for domains such as mathematics, coding, agents, and instruction tracking using supervised fine-tuning and GRPO. A single student model then learns from all these internal teachers.

Models Optimized for Agentic Tasks

Deepseek has designed the V4 models specifically for agentic workflows. The company claims that the models are integrated with tools such as Claude Code, OpenClaw, and OpenCode, and that they are already being used internally for agentic coding. The API supports both OpenAI and Anthropic-compatible interfaces.

The report is more precise regarding the hardware: the expert parallelism scheme has been validated on Nvidia GPUs and Huawei's Ascend NPUs. The open-source mega-kernel MegaMoE is based on CUDA, and Deepseek has replaced Nvidia's cuBLAS library with its own DeepGEMM.

Additionally, Huawei has announced that its Ascend Supernode, built on Ascend 950 AI chips, fully supports the V4 models.