Building Long-Running Claude Managed Agents: Why State Matters More Than Compute

Author(s): Divy Yadav Originally published on Towards AI. Photo from AI At 9:03 am on a Tuesday, my research agent said hello and stared at an empty /workspace/. Six hours of analysis from the night before. Gone. The cloned repository. The installed packages. The notes it had spent hours writing. Gone. I had assumed that if an agent stopped working for the night, it could simply continue the next morning. That was wrong. Over the next three weeks, I rebuilt the same workflow on Tensorlake, Cloudflare, and Daytona to figure out what had happened. The hardest part of running Claude Managed Agents isn’t the model. It’s everything underneath it. This is the exact code I ran, the things that broke, and the mistake that cost me two weeks to understand. If you want more such information about AI, consider subscribing to my newsletter, where you will get noise-free AI information every week Link for the newsletter: Newsletter What Claude Managed Agents is, before anything else Photo from Anthropic If you’ve never built with Claude Managed Agents, the architecture needs a minute. Skip this if you already know it. Anthropic runs the reasoning. You run the execution. The agent loop, session state, work queue, and retry logic all live on Anthropic’s infrastructure. You configure a Self-hosted Environment in the Claude Console. When your application starts a session, Anthropic queues the work, your orchestrator picks it up, spins up a sandbox, and the model starts issuing tool calls into that sandbox. Every bash, read, write, grep, and edit call executes inside an environment you own. Anthropic never touches it. You decide what that environment looks like, what it can access, and what happens between sessions. Anthropic’s intelligence is fixed. Your engineering determines whether that intelligence has a stable, stateful environment to work in, or a clean slate that forgets everything the moment it goes idle. What I was building and why it mattered Photo from AI I needed an agent that could do real deep-work research on a codebase: clone a repository, read through the module structure, build an understanding of how the pieces fit together, write notes, and propose refactoring strategies. The kind of work that takes a senior engineer a full day and an AI agent about six hours. The key constraint: the agent couldn’t do this all at once. Sometimes I’d kick off a session at 8pm, let it run until midnight, and pick it back up the next morning. The filesystem it had built during that first session — the analysis notes, the installed tools, the half-read source files — had to be there when the next session started. Rebuilding from scratch each time wasn’t viable. That constraint is what drove every provider decision I made. The requirements I didn’t know I had At the start, I thought I needed a Linux environment that could run Claude Managed Agents. By the end, I realized I actually needed three things. I found them all in one place, but not until I had looked in two others first. A filesystem that survived between work sessions. Near-zero cost while the agent was idle. The ability to branch from an already-completed analysis state. I did not discover all three requirements on day one.I discovered them one mistake at a time. How a session actually starts: the code before the sandbox You drive a session through the reference orchestrator using a simple command: make session PROMPT="Clone the repository at github.com/tensorlakeai/tensorlake. \Read through the module structure. Write a summary to /workspace/analysis.md. \Note any components that look like they could be simplified." The orchestrator sends this prompt to Anthropic as a new session. Anthropic picks it up, starts the agent loop, and immediately begins issuing tool calls. Those tool calls arrive at your sandbox. The agent reads files, runs bash commands, writes notes. The session runs until the task is complete or you stop it. The agent stream looks roughly like this as it runs: [thinking] The repository appears to be a Python SDK for…[bash] git clone https://github.com/tensorlakeai/tensorlake[bash] ls -la /workspace/tensorlake/[read] /workspace/tensorlake/tensorlake/sandbox.py[write] /workspace/analysis.md[thinking] The Sandbox class handles… Each bracketed event is a tool call going into your sandbox. The session accumulates state inside /workspace/ across all those calls. By the end of a six-hour session, that directory contains the cloned repo, installed packages, analysis files, and intermediate notes. That’s the state that needs to survive overnight. Build 1: Cloudflare Photo from Cloudflare My first assumption was that I needed a platform that could efficiently run Claude Managed Agents. Cloudflare is optimized for high-concurrency execution. My problem turned out to be different. The agent I was building accumulated hours of filesystem state between bursts of work. Notes, cloned repositories, installed dependencies, and intermediate analysis all needed to survive overnight. Cloudflare’s execution model wasn’t designed around that requirement.That was the first time I realized I wasn’t looking for compute. I was looking for persistent state. Build 2: Daytona Photo from Daytona The second build solved part of the problem.The agent could accumulate state throughout a session, which initially felt like progress. Then I wanted to test three different refactoring strategies starting from the same six-hour analysis. Instead of branching from that state, I found myself repeating the setup work each time: rebuilding context, reinstalling dependencies, and re-running analysis before I could begin the actual experiment. That was when I discovered my second requirement.Preserving state wasn’t enough.I also needed a way to branch from an existing state without repeating hours of work. Build 3: Tensorlake Photo from Tensorlake The first thing that caught my attention was not a feature. It was an architectural decision. Most platforms preserve state by keeping compute alive. This one treated compute and state as separate problems. The docs described a suspended sandbox that could preserve its state and resume in approximately 0.6 seconds. That was the first time I saw a design that directly addressed the problem I’d been running into. I wanted to know whether it actually worked. I started with […]