Devin Desktop: Cognition Merges Windsurf to Challenge Codex

⚡

Key Takeaways

1Cognition has integrated Windsurf and Devin into a single interface, Devin Desktop, to manage multiple code agents.

2This new platform promises increased autonomy for developers but must compete with established tools like Claude Code and Codex.

3JDN tested Devin Desktop, highlighting its potential but also the challenges posed by the competition.

💡Why it matters — Devin Desktop could transform the management of code agents, influencing developer productivity in the face of well-established competing solutions.

Devin Desktop: Cognition Merges Windsurf to Challenge Codex

Cognition has merged Windsurf and Devin into a new interface designed to manage multiple code agents simultaneously. The JDN tested this promise of autonomy, which is appealing but faces serious competition.

Windsurf is dead, long live Devin Desktop. Since the beginning of June, Windsurf, the IDE for agent-based coding developed by Cognition, has been merged with Devin under the name Devin Desktop. The application, available on Mac, Windows, and Linux, allows users to create and manage autonomous code agents from a kanban interface. We had already tested the Devin AI in April: it positioned itself as an even more autonomous alternative to the classic market CLI code agents (the "/goal" command did not yet exist) such as Claude Code, Codex, and Gemini CLI. Does this new version of Windsurf incorporate these capabilities? Autonomy, user experience, pricing… The JDN tested it for you.

The Real Difference of Devin Desktop

The real difference of Devin Desktop does not lie in its code generation capabilities, which are now quite close to those of the best agents on the market, but in how they are managed. The application replaces the traditional conversation panel with a true command center organized in the form of a kanban board. Each task assigned to an agent appears in a column according to its status:

in progress (running)
blocked (blocked)
completed (ready)

This allows users to launch multiple tasks in parallel, track their progress, and intervene only when human validation is necessary.

Cognition takes this logic a step further with the Agent Client Protocol or ACP. The open protocol allows for exchanges between Devin Desktop and code agents. In practice, the command center for agents is not limited to Devin: it can also host Codex CLI, Claude, Gemini CLI, OpenCode, or even your own agent. All can then be launched and monitored from the same interface. Devin Desktop is almost more of a universal code agent orchestrator than a true IDE.

By default, the application allows the selection of three different code agents:

Cascade: the former code agent from Windsurf, still integrated into the IDE. It operates with a single context window and sequentially. Users can choose from the list of models supported by Devin to operate the agent. It will likely be removed in the coming months.
Devin Local: the new default agent of Devin Desktop. Unlike Cascade, this agent can launch sub-agents, and its harness is much better optimized for tasks requiring high autonomy. The agent runs sandboxed locally on your computer.
Devin Cloud: the most autonomous agent. It selects the best available model at each turn and designs your application in a virtualized environment (Ubuntu or Windows) in the cloud, entirely autonomously. The code is hosted on a GitHub repo. This is the most autonomous agent and the most suited for software engineering tasks, in our opinion.

Testing Devin Cloud

To quickly test Devin's capabilities, we will ask it to create a weather chatbot, connected in real-time to all current and future weather station data. The goal: to be able to query the chatbot on any weather question and for it to respond with the highest accuracy, based on the raw models from Météo-France (notably Arome and Arpège). We will use Devin Cloud for this project due to its autonomy capabilities.

We simply give the following prompt to Devin Cloud:

Build a web application for an "AI meteorologist" chatbot in French.

Central criterion: the bot must NEVER limit itself to providing a raw forecast ("tomorrow 22°C, chance of rain"). It must reason like a professional forecaster: cross-reference multiple numerical models, assess their agreement or divergence, integrate real-time observations, explain the underlying synoptic situation, and always accompany its response with a confidence level. If two models diverge, it must state this and explain why.

Backend: Python + FastAPI.
Frontend: simple and clean web chat interface (one page is sufficient).
LLM: OpenAI API, key provided via the environment variable OPENAI_API_KEY.
Model to use: GPT-5.5. Never hardcode the key; read it from the environment.

Data Sources

Forecasts — Open-Meteo, Météo-France endpoint (AROME and ARPEGE models).
- Use at least arome_france_hd (high resolution ~1.3 km, short-term) AND arpege_europe (medium-term).
- Retrieve hourly forecasts: temperature at 2m, precipitation + probability, wind and gusts at 10m, cloud cover, sea-level pressure, relative humidity, CAPE if exposed.
- No key required.
- City geocoding -> lat/lon via Open-Meteo's free geocoding API.
Observations — Infoclimat, real-time station network.
- Read the documentation for the exact format and token management. If a token is required, expose it as an environment variable.
- Used for nowcasting and to compare models with observed reality.

Core of the Product: the "Meteorologist" Layer

The pipeline for each user question:

Identify the location and the requested timeframe.
Retrieve AROME, ARPEGE (forecasts), and Infoclimat (recent obs) in parallel.
Construct a structured data block (JSON) to pass to the LLM.
The system prompt of the LLM must impose the role of forecaster and require:
- explicit comparison AROME vs ARPEGE (agreement = high confidence; divergence = signal uncertainty and range);
- consideration of Infoclimat observations to set the starting point;
- explanation of the "why" of the weather (flux, front, anticyclone, instability…) when relevant;
- reminder of the limitations of the models (AROME is particularly relevant for short-term, ARPEGE for longer; reliability decreases with the timeframe);
- explicit confidence level (high / moderate / low);
- strict prohibition against inventing data: if data is missing, state it.

Acceptance Criteria

"Will it rain tomorrow afternoon in Serris?" -> nuanced response citing both models and a confidence level.
"Risk of thunderstorms this week in Île-de-France?" -> utilizes CAPE and signals uncertainty related to the timeframe.
"What’s the weather like right now in Lille?" -> relies on Infoclimat observation, not on the forecast.
"Are AROME and ARPEGE in agreement for Lyon tomorrow?" -> direct comparison.

GitHub repo with README: installation, expected environment variables (OPENAI_API_KEY, Infoclimat token if required), launch command. The app must run locally without code modification after entering the keys.

The agent works for about 30 minutes in total and delivers a production-ready interface. Although very minimalistic, it is fully functional. The retrieved data is reliable and allows for an accurate weather forecast. The session consumed about 46 of our daily quota from the Pro offer. This considerable consumption does not allow for a relaxed workflow on a project for several hours. For use beyond exploration, higher plans are inevitable.

For the more curious, the code is available on GitHub: https://github.com/BenjaminPolge/meteorologue-ia

Attractive Pricing, Goodbye ACUs

On paper, Devin Desktop offers a particularly attractive price. The Pro offer starts at just $20 per month and provides access to the most advanced models from OpenAI, Anthropic, and Google, as well as Devin Cloud. Cognition has notably abandoned the ACUs, those opaque and difficult-to-predict computing units that we criticized in our previous test. The daily and weekly quota system is much clearer: users can immediately see how much of their allocation has been consumed, without having to convert each action into abstract credits.

However, this apparent accessibility should be put into perspective. Our only test, completed in about thirty minutes, consumed nearly half of the daily quota of the Pro offer. At $20 per month, Devin Desktop will therefore mainly suit occasional personal projects, for example, to quickly develop a small application over a weekend. For daily work with multiple agents, correcting an existing product, or conducting several hours of sessions, the limit will be reached far too quickly. A genuine individual professional use requires, in our opinion, to jump directly to the Max offer priced at $200 per month.

Nonetheless, Devin Desktop remains an excellent code agent, capable of delivering functional applications quickly with very little human intervention. However, the historically favored terrain of Devin is beginning to shrink significantly. With the arrival of advanced autonomy loops like /goal in Codex and Claude Code, Devin Cloud stands out much less clearly than it did just a few months ago. Its kanban interface and multi-agent approach remain appealing, but they are no longer sufficient to justify its positioning and price alone. It will now be necessary to observe how Cognition evolves the product to recreate a true competitive advantage. After all, the merger between Windsurf and Devin is only a few weeks old.

Devin Desktop: Cognition Merges Windsurf to Challenge Codex

Le brief IA que les pros lisent chaque soir

Devin Desktop: Cognition Merges Windsurf to Challenge Codex

The Real Difference of Devin Desktop

Testing Devin Cloud

Data Sources

Core of the Product: the "Meteorologist" Layer

Acceptance Criteria

Attractive Pricing, Goodbye ACUs

Brief IA — L'actualité IA en français