[ Blog ]

OpenAI Agents SDK Sandboxes: Which Provider Should You Actually Use?

May 19, 2026·Pavitra Bhalla

“Supported” isn’t “best.” OpenAI ships SandboxClient classes for seven hosted providers (plus two local clients) in the Agents SDK. That’s an integration shim, not an endorsement - OpenAI doesn’t benchmark, rank, or vouch for any of them. The rest of this post is what we found when we actually looked.

A working developer's guide to picking the right execution backend - without the marketing.

Disclosure: this post is written by the team at superserve.ai. We build persistent sandboxes for AI agents, so we have a horse in this race - specifically the “persistent-by-default workspace” gap we discuss below. We don’t ship a SandboxClient in the OpenAI Agents SDK yet - this post is about the seven providers that do.

TL;DR

The Agents SDK ships with two local sandbox clients (Unix-local, Docker) and seven hosted ones: E2B, Modal, Daytona, Runloop, Vercel, Cloudflare, and Blaxel - the natively integrated list as of May 2026 (OpenAI sandbox clients documentation). The list can change between SDK minor releases.
All seven hosted providers do roughly the same job (spin up a Linux workspace, run commands, mount storage), but they’re optimized for different things: cold start, persistence, GPU, edge proximity, or developer-environment fidelity.
The native list has gaps, notably persistent-by-default workspaces and EU data residency aren’t universally covered, and pricing models are wildly inconsistent.
If you’re picking once and never re-evaluating, don’t. The Agents SDK is designed so the SandboxAgent definition stays stable and only the client swaps.

Why sandbox choice matters for the Agents SDK

“Native” ≠ “endorsed.” Being on the OpenAI Agents SDK’s sandbox client list means one engineer wrote and shipped a SandboxClient wrapper. It does not mean OpenAI tested, benchmarked, or recommends that provider. Treat the list as “compatible,” not “curated.”

The Agents SDK’s sandbox-agents feature is built on a clean split: the harness (model calls, tool routing, approvals, tracing, recovery) stays in your trusted infrastructure, and the sandbox is the execution plane where the model writes files, runs commands, and exposes ports.

That means your sandbox provider is the only thing standing between an LLM and arbitrary code execution on real infrastructure. Get it wrong and you get one of three failure modes:

Security: the agent writes a curl-pipe-bash and your sandbox isn’t isolated enough.
Cost: your sandbox bills for idle wall-clock, your agent thinks for 90 seconds between turns, and your AWS bill triples.
Latency: cold starts on every turn can dominate end-to-end task time - if your sandbox takes seconds to spin up and your task only runs for seconds, the cold start is the task.

The natively supported list is a useful shortlist precisely because the integration is done for you. But “supported” ≠ “best.” Here’s the actual rundown.

At a glance

If you only read one section, read this. (Full side-by-side table below; per-provider deep-dives after that.)

Already on Vercel or Cloudflare? Use the matching native client. The cheapest decision is the one that doesn't add a vendor.
Need GPUs? Modal first, Daytona second.
Running untrusted code at scale? E2B (Firecracker microVM) is the safest starting bet. Vercel and Runloop also use Firecracker.
Coding agent that needs the same workspace tomorrow? Runloop, Daytona, or Blaxel - in order of maturity, raw spin-up speed, and perpetual-standby economics.
Just prototyping? UnixLocalSandboxClient. Stop overthinking it.

Side-by-side comparison

Provider	Isolation	Persistence	GPU	Mounts	Pricing model	Best fit
UnixLocal	None (host process)	Local FS	Host’s	Local bind	Free	Dev loop
Docker	Container	Volumes	Host’s	S3/R2/GCS/Azure/Box/S3 Files via volumes	Free	Self-hosted prod
E2B	Firecracker microVM	Up to 24h session	No	S3/GCS/R2/Azure/Box (rclone)	Per-second + tiered plan	Untrusted code at scale
Modal	gVisor container	Ephemeral + Modal volumes	Yes	S3/R2/GCS (Modal cloud bucket)	Pure per-second	GPU-bearing agents
Daytona	Containers	Snapshots + clone/archive	H100, RTX PRO 6000	S3/GCS/R2/Azure/Box (rclone)	Per-hour, per-second metered	Stateful workspaces
Runloop	microVM	Suspend/Resume	No	S3/GCS/R2/Azure/Box (rclone)	$250/mo Pro + usage	Coding agents
Vercel	Firecracker microVM	Snapshots + persistent (beta)	No	None via SDK integration	Itemized: CPU/mem/creations/net/storage	Node-heavy code-gen
Cloudflare	Container (DO-backed)	Container lifetime	No	S3/R2/HMAC GCS	Workers Paid plan + container usage	Edge-native agents
Blaxel	Unikraft microVM	Perpetual (auto-standby)	No	S3/R2/GCS + Blaxel Drives	Per second + Tiered Plan	Long-lived sessions

The natively supported list

UnixLocalSandboxClient (built-in)

What it is: Runs the sandbox as a local process tree on the host machine, no container. Ships with openai-agents.

Strengths:

Zero install. Ships with openai-agents.
Fastest local iteration.
PTY support via a small Python 3 bridge.
Perfect for development on macOS/Linux.

Weaknesses:

No isolation. The agent can read your ~/.ssh/ if you let it.
Not a production option.

Best for: Local dev loop, examples, demos.

python
from agents.run import RunConfig
from agents.sandbox import SandboxRunConfig
from agents.sandbox.sandboxes.local import UnixLocalSandboxClient

run_config = RunConfig(
	sandbox=SandboxRunConfig(client=UnixLocalSandboxClient()),
)

DockerSandboxClient (built-in, openai-agents[docker])

What it is: Runs sandbox work inside a local Docker container with a configurable image.

Strengths:

Container isolation boundary.
Image parity with prod.
Supports bind mounts and Docker volume mount strategies (S3, R2, GCS, Azure Blob, Box, S3 Files).
Good for CI.

Weaknesses:

It’s Docker. You own the host, the images, the cleanup, and the security posture.
Not a hosted service.
Weak isolation for untrusted code execution.

Best for: Self-hosted production where you already operate container infra.

python
from agents.run import RunConfig
from agents.sandbox import SandboxRunConfig
from agents.sandbox.sandboxes.docker import DockerSandboxClient, DockerSandboxClientOptions
from docker import from_env as docker_from_env

run_config = RunConfig(
	sandbox=SandboxRunConfig(
		client=DockerSandboxClient(docker_from_env()),
		options=DockerSandboxClientOptions(image="python:3.14-slim"),
	),
)

E2BSandboxClient (openai-agents[e2b])

What it is: E2B runs sandboxes backed by Firecracker microVMs. Open-source core + hosted offering, used heavily by code-interpreter products.

Strengths:

Firecracker microVM isolation - kernel-level boundary, not gVisor or just containers.
Per-second billing. Hobby tier free, then $0.000014–$0.000112/vCPU/s + $0.0000045/GiB/s memory. Pricing is transparent and granular.
Mature SDK with code-interpreter and computer-use templates.
Supports rclone-backed mounts for S3/GCS/R2/Azure Blob/Box.

Weaknesses:

Hobby tier caps sandbox session length at 1 hour, Pro at 24 hours - not designed for persistent or long-running workloads.
Concurrency caps (20 on Hobby, 100 on Pro) are real.
Pricing math gets nontrivial once you scale concurrent sandboxes.

Best for: Code-interpreter-style workloads, untrusted-code execution at scale, anyone who wants the strongest documented isolation in the list.

ModalSandboxClient (openai-agents[modal])

What it is: Modal’s Sandbox primitive - same platform that runs serverless GPU jobs, exposed as ephemeral containers for agent code.

Strengths:

Pure per-second billing, no monthly minimum: $0.00003942/core/sec CPU, $0.00000672/GiB/sec memory. Burst-up resources without pre-provisioning.
Heavyweight GPU story - Modal is unusually good if your agent needs to run a model, not just call one.
Supports Modal cloud bucket mounts (S3, R2, HMAC GCS) and Modal Secrets for credentials.
Polyglot SDKs (Python, JS/TS, Go).

Weaknesses:

Isolation is gVisor-based. That's a real syscall boundary - stronger than a plain container, weaker than a Firecracker microVM for hostile-code threat models. For most agent workloads it's fine; for arbitrary untrusted code at scale, microVM-based providers (E2B, Vercel, Runloop) have a defense-in-depth edge.
Cold starts can be longer than dedicated microVM platforms if your image is heavy.

Best for: Agents that need GPUs or that already share infra with Modal jobs.

DaytonaSandboxClient (openai-agents[daytona])

What it is: Daytona reframed itself from a dev-environment manager into a sandbox-for-AI-code provider. Open-source core (daytonaio/daytona).

Strengths:

Vendor-claimed sub-90ms sandbox creation (Daytona pricing page) - if accurate, among the fastest documented in this list. Measure on your workload before relying on it.
Stateful by design: snapshot, clone, archive, resume. Good fit for the Agents SDK’s snapshot/resume model.
Per-hour list pricing with per-second metering: $0.0504/vCPU/hr, $0.0162/GiB-memory/hr, $0.000108/GiB-storage/hr. H100 GPUs at $3.95/hr, RTX PRO 6000 at $3.03/hr.
$200 in free credits, no card required.
Rclone-backed mounts for S3/GCS/R2/Azure Blob/Box.

Weaknesses:

Isolation model is documented less explicitly than E2B’s.
Newer in the AI-sandbox positioning, so the long-tail of integrations and templates is smaller than E2B/Modal.

Best for: Stateful, long-running agent workspaces where you want fast spin-up and snapshot/clone semantics.

RunloopSandboxClient (openai-agents[runloop])

What it is: Runloop’s “Devboxes” - long-lived microVM sandboxes aimed specifically at AI coding agents.

Strengths:

Built around the coding-agent use case: Blueprints (custom env images), Snapshots, Repo Connections, and Public Benchmarks for evaluating agents.
Suspend/Resume is a first-class Pro feature - aligns with the Agents SDK’s sandbox-resume pattern.
Free tier with $50 in credits; Pro at $250/mo + usage; Enterprise with VPC deploy.
Rclone-backed mounts across the usual storage providers.

Weaknesses:

$250/mo Pro floor is the highest fixed minimum in the list - you commit before you scale.
Public usage pricing is less granular than E2B/Modal/Daytona; you’ll likely talk to sales for real numbers.

Best for: Teams building coding agents, especially ones doing reinforcement fine-tuning or benchmark-driven evaluation.

VercelSandboxClient (openai-agents[vercel])

What it is: Vercel Sandbox - ephemeral Firecracker microVMs on Amazon Linux 2023, marketed at Vercel’s web-dev audience.

Strengths:

Firecracker isolation. Default runtime is node24 (also node22/node26 and python3.13), making it the obvious choice if your agent generates and runs Node code.
Tight integration with Vercel OIDC for auth.
Snapshotting and a beta “persistent sandboxes” mode that auto-saves state on stop.
Pricing is line-itemized: Active CPU $0.128/hr, Provisioned Memory $0.0212/GB-hr, Sandbox Creations $0.60/M, Network $0.15/GB, Snapshot Storage $0.08/GB-month (Vercel Sandbox pricing).

Weaknesses:

No remote storage mount strategies are exposed in the Agents SDK integration - you have to materialize inputs through the manifest.
Max runtime 45 min on Hobby, 5 hours on Pro. Not for multi-day agent sessions.
The Vercel-OIDC-first auth model is friction if you don’t otherwise live in Vercel’s ecosystem.

Best for: Teams already on Vercel; Node-heavy code-generation agents; preview-URL-driven workflows.

CloudflareSandboxClient (openai-agents[cloudflare])

What it is: Cloudflare’s Sandbox SDK, built on Cloudflare Containers + Durable Objects, connected to the Agents SDK via a deployed “bridge Worker” (the integration speaks HTTP to your Worker).

Strengths:

Edge-native: your sandbox runs close to where your Worker code already runs.
Strong primitives: code interpreter, file watching, terminal WebSocket, preview URLs, WebSocket connections, git integration.
Cloud bucket mounts (S3, R2, HMAC GCS).
Pricing folds into the Workers Paid plan - predictable if you’re already a Cloudflare customer.

Weaknesses:

Beta. The Sandbox SDK itself flags “APIs may change before v1.0” (cloudflare/sandbox-sdk).
Bridge-Worker setup adds architectural friction vs a hosted-SDK provider.
Isolation is container-based (not microVM) inside Cloudflare’s infra.

Best for: Teams already on Workers; agents that want preview URLs at the edge; latency-sensitive workloads where running in Cloudflare’s network is a real advantage.

BlaxelSandboxClient (openai-agents[blaxel])

What it is: Blaxel - positioned as the “perpetual sandbox” platform: persistent-by-default workspaces with ~25ms resume from standby (provider claim) and $0 compute when idle.

Strengths:

Persistence model is genuinely different: sandboxes can sit on standby indefinitely, with memory and filesystem snapshotted, and resume quickly.
Supports cloud bucket mounts (S3, R2, GCS) and persistent Blaxel Drives (distributed FS shared across sandboxes/sessions).
Co-hosted agent loop + MCP servers + sandboxes on the same backbone - pitched as sub-1ms intra-stack latency.

Weaknesses:

Newest of the bunch; ecosystem and tooling depth are still growing.
Public pricing details are less transparent than E2B/Modal/Vercel.
“25ms resume” is a great claim - measure it against your real workload before designing around it.

Best for: Long-lived agent sessions (coding agents that need their workspace tomorrow), or any workflow where cold start is the dominant cost.

What’s missing from the native list

Three honest gaps:

Pricing transparency is uneven. E2B, Modal, and Vercel publish per-second or per-resource pricing you can plug into a spreadsheet. Runloop has a $250/mo Pro floor and routes the rest through sales. Blaxel and Daytona partly publish, partly don’t.
EU / data residency. None of the native integrations expose data-residency as a first-class option in the SandboxRunConfig. You can pick a provider that runs in the EU, but it’s outside the SDK’s contract.
Persistent-by-default workspaces. The SDK’s snapshot/resume model is great, but most integrated providers still default to ephemeral sandboxes. Blaxel and Daytona lean into persistence; the others treat it as a feature, not the default.

Persistent-by-default is genuinely hard, which is part of why the native list mostly punts on it. The provider has to keep a snapshot store hot, bill correctly for idle workspaces (or not bill, which means eating standby cost), and handle long-tail edge cases - workspace drift, secret rotation, half-written files on resume. Blaxel and Daytona lean hardest into this; Vercel ships a beta persistent mode. It's also, per our disclosure above, the gap superserve.ai is built into. The point of flagging it here isn't to plug ourselves - it's that if your agent needs a workspace that survives the turn, the native list narrows fast.

Closing

The Agents SDK did the right thing by making sandbox choice swappable. The same SandboxAgent definition runs on a UnixLocal client in dev, Docker in CI, and a hosted microVM in prod - that’s the whole point of the harness/compute split. The native list is a strong starting shortlist, not a verdict on which sandbox is “best.”

Have numbers? If you’ve benchmarked any of these on real agent workloads, we want the numbers - especially the ones that contradict provider marketing. Send us:

Workload - what your agent actually does (e.g., “coding agent doing pip install + pytest”).
Provider(s) tested - and the region/image you used.
Cold start p50/p95 - milliseconds from create() to first command echo.
Cost per 1k runs - your actual billed amount, not the marketing rate.
Surprises - anything that broke, hung, or behaved differently than the docs said.

Send to engineering@superserve.ai. We’ll keep this post updated with the latest numbers as they come in - with attribution, unless you ask us not to.

TL;DR

Why sandbox choice matters for the Agents SDK

At a glance

Side-by-side comparison

The natively supported list

UnixLocalSandboxClient (built-in)

DockerSandboxClient (built-in, openai-agents[docker])

E2BSandboxClient (openai-agents[e2b])

ModalSandboxClient (openai-agents[modal])

DaytonaSandboxClient (openai-agents[daytona])

RunloopSandboxClient (openai-agents[runloop])

VercelSandboxClient (openai-agents[vercel])

CloudflareSandboxClient (openai-agents[cloudflare])

BlaxelSandboxClient (openai-agents[blaxel])

What’s missing from the native list

Closing

Sources