Brief IA

Claude: The State of Managed Agents, More Crucial Than Calculation

🔬 Research·Tom Levy·

Claude: The State of Managed Agents, More Crucial Than Calculation

Claude: The State of Managed Agents, More Crucial Than Calculation
Key Takeaways
1A research agent lost six hours of nighttime work, highlighting the importance of persistent state.
2Managed agents like Claude require a self-hosted environment to maintain state between sessions.
3Cloudflare and Daytona failed to preserve state, but Tensorlake provided a viable solution.
💡Why it mattersState management is crucial for maximizing the efficiency of AI agents, preventing the loss of time and resources.
Le brief IA que lisent les pros

Le brief IA que les pros lisent chaque soir

Les 7 actus IA du jour, décryptées en 5 min. Gratuit.

Inclus dès l'inscription : notre sélection des meilleurs guides & comparatifs IA.

Choisis ton rythme

Gratuit · Pas de spam · Désabonnement en 1 clic

📄
Full Analysis

The Discovery of an Unexpected Loss

On a Tuesday morning at 9:03 AM, my research agent greeted me, but it was to note a troubling void in the work directory. The six hours of analysis conducted the previous night had vanished. Everything, from the cloned repository to the installed packages, along with the meticulously written notes, had evaporated. I had naively believed that an agent could simply pick up its work where it left off after a night’s break. This assumption turned out to be incorrect. To understand what had happened, I had to reconstruct the same workflow across three different platforms: Tensorlake, Cloudflare, and Daytona. What proved most complex in managing Claude agents was not the model itself, but everything beneath it.

Here is the exact code I executed, the elements that failed, and the error that cost me two weeks to understand.

Understanding Managed Claude Agents

For those unfamiliar with managed Claude agents, it is crucial to understand their architecture. If you are already familiar with this system, you can skip this section. Anthropic is responsible for reasoning, while you manage execution. The agent loop, session state, work queue, and retry logic are all hosted on Anthropic's infrastructure. You set up a self-hosted environment via the Claude console. Once your application is launched, Anthropic queues the work, your orchestrator picks it up, creates a sandbox, and the model begins to issue tool calls within that sandbox. Each command, whether it be bash, read, write, grep, or edit, executes in an environment that you fully control. Anthropic never intervenes directly. You define the appearance of this environment, its access, and what happens between sessions. Anthropic's intelligence is constant. Your engineering determines whether this intelligence operates in a stable and persistent environment or on a blank slate that forgets everything as soon as it becomes inactive.

My Project and Its Importance

My goal was to create an agent capable of conducting in-depth research on a codebase: cloning a repository, analyzing the structure of modules, understanding how different parts fit together, writing notes, and proposing refactoring strategies. This type of work, which would take an entire day for a senior engineer, can be accomplished in about six hours by an AI agent. The main constraint was that the agent could not do everything at once. Sometimes, I would start a session at 8 PM, let it run until midnight, and then pick it up again the next morning. The file system created during that first session—with analysis notes, installed tools, and partially read source files—needed to be available when the next session began. Rebuilding each time was not a viable option. This constraint guided every vendor decision I made.

Unforeseen Needs

Initially, I thought a simple Linux environment would suffice to run the managed Claude agents. Ultimately, I realized I needed three essential elements. I found them all in one place, but only after exploring two other options.

  • A file system that survives between work sessions.
  • Near-zero cost while the agent is idle.
  • The ability to branch from an already completed analysis state.

These requirements did not reveal themselves immediately. I discovered them through a series of errors.

Starting a Session: Code Before the Sandbox

To launch a session, you use the reference orchestrator with a simple command:

[make](/outil/make) session [PROMPT](/glossaire/prompt)="Clone the repository at github.com/tensorlakeai/tensorlake. \Read through the module structure. Write a summary to /workspace/analysis.md. \Note any components that look like they could be simplified."

The orchestrator sends this prompt to Anthropic as a new session. Anthropic takes it on, starts the agent loop, and immediately begins issuing tool calls. These tool calls arrive in your sandbox. The agent reads files, executes bash commands, writes notes. The session continues until the task is completed or you stop it. The agent's flow looks something like this while it operates:

  • [thinking] The repository appears to be a Python SDK for…
  • [bash] git clone https://github.com/tensorlakeai/tensorlake
  • [bash] ls -la /workspace/tensorlake/
  • [read] /workspace/tensorlake/tensorlake/sandbox.py
  • [write] /workspace/analysis.md
  • [thinking] The Sandbox class manages…

Each event in brackets is a tool call entering your sandbox. The session accumulates a state inside /workspace/ through all these calls. At the end of a six-hour session, this directory contains the cloned repository, installed packages, analysis files, and interim notes. It is this state that must survive the night.

First Attempt: Cloudflare

My first hypothesis was that I needed a platform capable of efficiently running managed Claude agents. Cloudflare is optimized for high-concurrency execution. My problem turned out to be different. The agent I was building accumulated hours of file system state between work periods. Notes, cloned repositories, installed dependencies, and interim analyses needed to survive the night. Cloudflare's execution model was not designed around this requirement. It was the first time I realized that I was not looking for computation. I was looking for persistent state.

Second Attempt: Daytona

The second build solved part of the problem. The agent could accumulate state throughout a session, which initially seemed like progress. Then, I wanted to test three different refactoring strategies starting from the same six-hour analysis. Instead of branching from that state, I found myself repeating the setup work each time: rebuilding the context, reinstalling dependencies, and rerunning the analysis before I could start the actual experiment. It was at that moment that I discovered my second requirement. Preserving state was not enough. I also needed a way to branch from an existing state without repeating hours of work.

Third Attempt: Tensorlake

The first thing that caught my attention was not a feature. It was an architectural decision. Most platforms preserve state by keeping computation active. This one treated computation and state as distinct problems. The documentation described a suspended sandbox capable of preserving its state and resuming in about 0.6 seconds. It was the first time I saw a design that directly addressed the problem I was facing. I wanted to know if it actually worked. I started with…

Brief IA — L'actualité IA en français

L'essentiel de l'actualité de l'intelligence artificielle, décrypté et expliqué chaque jour.