OpenAI and Anthropic: Rising but Limited AI Productivity

⚡

Key Takeaways

1Since April 2026, OpenAI and Anthropic have integrated AI tools, increasing engineering productivity by 1.6x.

2The overall acceleration of AI progress is more modest, between 1.15x and 1.2x, as engineering is just a subset of the work.

3AIs succeed in simple tasks but often fail on complex projects requiring creativity and structure.

💡Why it matters — AI enhances productivity but has significant limitations, influencing the strategy of tech companies.

The Rise of AI Tools in Businesses

Starting in early April 2026, artificial intelligence companies such as OpenAI and Anthropic began to massively integrate AI tools into their workflows. This integration has led to notable productivity gains, albeit not excessive ones. At the beginning of 2026, the acceleration in research and engineering was about 1.4 times, but it reached approximately 1.6 times at OpenAI and Anthropic due to more efficient models, better tools, improved adaptation, and increased dissemination.

The use of these tools allows productivity in engineering to increase as if employees were working 1.6 times faster. This is not limited to coding but also includes other activities such as determining which features to implement and coordinating with other engineers. Many specific engineering and research tasks can now be completed with significantly less human time, for instance, three to ten times less time, while other tasks see much more modest gains.

New Work Dynamics

Employees are directing their work towards two types of tasks: those of lower value where AIs are particularly useful, and those they would not have been able to accomplish without AI assistance due to insufficient skills or knowledge. When considering the impact of AI, one often wonders how long it would take to complete the work without AI. However, this question does not capture reality, as workflows have been adapted. A more relevant question would be: "How much should we speed you up before you become indifferent between this acceleration and using AI tools?" The answer to this question is currently about 1.6 times.

The acceleration is also less significant than it appears, as the resulting code is generally less polished, less reliable, and less well understood than if it had been written solely by human engineers. It is common for no one, including the AIs themselves, to fully understand how a piece of code works or how it fits into a larger system, which makes certain issues more frequent.

Impact on Research and Development

For much of the research and development in AI, low reliability and poor understanding are not catastrophic. Moreover, experiments are generally conducted in small, relatively autonomous projects where AIs and humans can achieve a decent understanding of what is happening. This acceleration in engineering is not evenly distributed. I expect Anthropic to achieve greater acceleration than OpenAI, which in turn achieves substantially more acceleration than GDM.

Although the acceleration in engineering is 1.6 times, the overall acceleration of AI progress is much lower, around 1.15 times or 1.2 times, as engineering is just a subset of the relevant work.

Capabilities and Limitations of Current AIs

AIs are capable of automating increasingly larger and more complex tasks. The historical METR benchmark has primarily saturated regarding reliability measurement at 50%, but at 80%, the best publicly deployed models achieve just over one hour, while the best internal models reach just under two hours.

On easy and inexpensive-to-verify tasks, AIs can often accomplish difficult tasks that would take the best human experts several months or even years. This requires some custom structure and large amounts of inference computation, but remains less costly than human labor for the same task. However, AIs often fail to complete tasks because they require ideation or very complex methods that are difficult to build incrementally and haphazardly. The simpler the task is as a relatively straightforward engineering project but extremely vast, the better AIs perform.

Often, they fail simply because they do not invest enough effort or give up too early. Adding a human, even with minimal context, can significantly help by noticing and correcting some of these issues.

Regarding benchmarks and easy-to-verify tasks, AIs perform worse on randomly sampled engineering tasks within AI companies. If we randomly sample internal engineering tasks weighted by value, the duration of tasks where AIs match an engineer from an AI company is about five hours.

AIs have not made much progress on very difficult-to-verify or conceptually tricky tasks and tend to be messy in their reasoning and results. A new generation of much more capable AIs is being developed, such as Mythos at Anthropic and Spud at OpenAI. I expect this to be largely driven by the increase and/or improvement of pre-training.

OpenAI and Anthropic: Rising but Limited AI Productivity

Le brief IA que les pros lisent chaque soir

The Rise of AI Tools in Businesses

New Work Dynamics

Impact on Research and Development

Capabilities and Limitations of Current AIs

Brief IA — L'actualité IA en français