Persistent agent sandboxes on Firecracker, no 24-hour cap

April 29, 2026 • By Jeet

Superserve is a sandbox platform for running agents in isolated cloud environments, built on Firecracker microVMs. The category is getting crowded (Daytona, E2B, Modal, and others), and the pitches sound similar enough that picking one can feel like a coin flip.

It isn't. The differences are real, and most of ours flow from one foundational choice. Here's what we built, why we built it that way, and where we fit.

Why Firecracker

Firecracker is a virtual machine monitor written by AWS in Rust. It's what powers Lambda and Fargate under the hood. Each microVM boots a real Linux kernel on hardware virtualization, which is a much stronger boundary than container namespaces sharing a kernel with the host. The codebase is small on purpose, around 50K lines, with a deliberately minimal device model. It's the closest thing to "real VM isolation with container-like overhead" that exists today.

We picked it for one reason: if an agent is going to run code generated from a user's prompt, the host should not have to trust that code. Firecracker gives us that, with a per-sandbox memory overhead measured in single-digit megabytes.

A few things flow naturally from this choice:

Snapshotting actually works. Firecracker can snapshot the full state of a microVM (memory, CPU, devices) and restore it later. This is the mechanism behind everything stateful we offer. Pause and resume isn't a metaphor, it's a memory snapshot. Templates that boot with a process already running aren't a clever build trick, they're a snapshot taken after the process is healthy.

Boots are fast. Not as fast as containers (we're around 200ms cold, vs Daytona's 60 to 90ms), but fast enough that a microVM-per-sandbox model is the right default.

The blast radius of one sandbox is one sandbox. Kernel exploits, escape attempts, runaway processes: they stay inside the VM.

We also actively contribute back to Firecracker upstream. Running on it isn't enough for us, we want to help shape where it goes.

What we built on top

Persistent sandboxes, pause and resume

A Superserve sandbox lives until you delete it. There's no 24-hour cap and no eviction. sandbox.pause() snapshots memory, processes, and disk. sandbox.resume() restores in milliseconds with the same processes still running and the same variables in scope.

A few details that matter in practice:

  • Trying to exec on a paused sandbox auto-resumes it first. You don't have to track state in your own code.
  • We bill for running time, not paused time. A long-lived agent can sit paused between turns and pay nothing for the idle gap.
  • A sandbox is active or paused, and eventually deleted. That's it. No "stopped vs archived vs paused." We chose simplicity here on purpose.

Per-sandbox egress firewalling

Each sandbox declares what it's allowed to talk to. CIDRs and domain names both work. A sandbox can reach api.openai.com and *.github.com, and nothing else. This is one of the strongest mitigations for prompt injection: even if an agent is convinced to exfiltrate data, the network layer says no. Private ranges are blocked by default.

tsx
const sandbox = await Sandbox.create({
  template: "superserve/base",
  network: {
    allow_out: ["api.openai.com", "*.github.com"],
    deny_out: ["10.0.0.0/8"],
  },
});

Per-sandbox HMAC tokens

Data-plane operations (terminal, file upload, file download) use sandbox-scoped tokens, not your team API key. If a sandbox token leaks, the blast radius is one sandbox. Tokens rotate on resume.

Templates that snapshot a running process

A template can declare a start_cmd (the long-running process to launch) and a ready_cmd (how to know it's healthy). The build system boots the template, waits for ready, then snapshots the full memory and disk state. Sandboxes booted from that template come up with the process already running. Same variables in scope. Same in-memory caches warm.

tsx
const template = await Template.create({
  alias: "vllm-llama-8b",
  base: "superserve/base",
  steps: [
    { run: "pip install vllm" },
    { workdir: "/srv" },
  ],
  start_cmd: "vllm serve meta-llama/Llama-3.1-8B --port 8000",
  ready_cmd: "curl -fs http://localhost:8000/health",
});

// Boot a sandbox from the snapshot.
// vLLM is already serving on :8000 when it comes up.
const sandbox = await Sandbox.create({
  template: "vllm-llama-8b",
});

The cost of starting that sandbox is roughly the cost of restoring a snapshot. The model server is already loaded.

A console

Templates list with build history and a log viewer. Sandbox list with metadata filtering. In-browser terminal. File browser. Audit logs. We use it ourselves every day, which has been the best forcing function for making it actually usable.

Workspaces that outlive the sandbox

A filesystem built for how agents work. Today their files live on an ext4 mount that dies with the sandbox. We're integrating a virtual filesystem layer that makes every workspace persistent, branchable, and versioned. Spin up five sandboxes from one branch, pick the winner. Roll back ten minutes of an agent's edits without restarting it. Mount the same workspace into a fresh sandbox a week later. Pause and resume saves the agent's mind. This saves its work. More soon.

Open source, top to bottom

The sandbox runtime, the SDKs (TypeScript and Python), and the console are all open source under Apache 2.0. Self-host it, fork it, audit it, send a PR. We're not a black box you have to trust.

How we stack up

SuperserveE2BDaytona
IsolationFirecracker microVMFirecracker microVMContainers
Cold start~200ms~200ms60-90ms
Pause + resume (full memory)
Persistent (no runtime cap)✗ (24h on Pro)
Templates with snapshotted running processes
Per-sandbox egress firewall (CIDR + domain)✗ (CIDR only)
Computer Use / desktop sandboxes
GPU support
Jupyter-style code interpreter
Open source✓ (Apache-2.0)✓ (Apache-2.0)✓ (AGPL-3.0)

Where the others are better

We'd rather you pick the right tool than the most-marketed one.

Daytona has faster cold starts (containers don't boot a kernel), the most mature Computer Use story across Linux, macOS, and Windows, and a real GPU offering. If you need a desktop or GPU-heavy workloads right now, look there.

E2B has the most polished code-interpreter experience in the category and a Desktop Sandbox SDK if you need computer use. They use Firecracker too, so the isolation story is similar to ours. Worth knowing: their sandboxes cap at 24 hours on the Pro plan, and persistence is currently in beta.

Modal, Sprites.dev each have their own wedge. Worth scanning if your workload is unusual.

What's next

A couple of things we're working on that we'll have more to share on soon: a virtual filesystem layer, and auto-pause on idle, secrets proxy. All are aimed at the same thing: making long-lived sandboxes cheaper, secure and easier to reason about.

Where we fit

If you care about isolation as much as we do, Superserve is built for you. Firecracker microVMs, snapshot-based pause and resume, per-sandbox egress control, persistent sandboxes you only pay for when they're running, and a console we use every day. The foundation is solid and we're shipping fast on top of it.

Quickstart is at docs.superserve.ai/quickstart. If you have a workload you're not sure how to model, email us.