[release] 5 min · Apr 17, 2026

OpenAI Agents SDK — Solves Sandbox Pain, Deepens Lock-In

OpenAI shipped native sandbox execution and a model-native harness for its Agents SDK. It solves genuine infrastructure pain — on OpenAI's terms.

#ai-agents#openai#agentic-infrastructure#sandbox#lock-in

On April 15, OpenAI shipped the biggest Agents SDK update since the framework launched: native sandbox execution, a model-native harness covering the full production agent lifecycle, and integrations with seven sandbox providers out of the box. The pitch is “safer agents for enterprises.” The architecture is “everything runs closer to OpenAI’s models, on OpenAI’s terms.” If you are building production agents on GPT-5.x, this removes real pain. If you are building on anything else, it removes nothing.

TL;DR

  • What: OpenAI Agents SDK gets native sandbox execution (Python-only at launch) and a model-native harness with configurable memory, snapshotting, and Codex-style tools
  • Providers: Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, Vercel — plus a Manifest abstraction for S3, GCS, Azure Blob, and Cloudflare R2
  • Lock-in signal: The harness is optimized for frontier OpenAI models. Manifest is portable across sandbox providers, not across model providers
  • Action: If you are already OpenAI-native, evaluate immediately. If you are multi-model, this is not for you

What Shipped — The Architecture in Detail

The update has two core pieces: a sandbox execution layer and a harness that orchestrates agent runs across that layer.

The sandbox isolates model-generated code from your infrastructure. Credentials stay outside the execution environment by design — the harness and compute layers are architecturally separated. Python is the only supported runtime at launch; TypeScript is on the roadmap. Built-in snapshotting and rehydration mean agent runs survive container loss and resume from the last checkpoint, which eliminates a class of reliability problems that every team building production agents has encountered.

The harness is the more consequential piece. It provides configurable memory, sandbox-aware orchestration, and Codex-style filesystem tools — specifically the shell tool and apply_patch tool — that let agents interact with their execution environment the same way Codex interacts with codebases. The primitives include tool use via MCP, progressive disclosure via skills, and custom instructions via an AGENTS.md file.

# Manifest abstraction — mount local files, define outputs, connect cloud storage
manifest = Manifest(
    inputs=["./src", "s3://my-bucket/data"],
    outputs=["./results"],
    sandbox_provider="e2b",
)

agent = Agent(
    model="gpt-5",
    harness=Harness(manifest=manifest, memory="persistent"),
    tools=[shell, apply_patch],
)

The Manifest abstraction is the portability layer — and understanding what it makes portable matters. You can swap sandbox providers freely: move from E2B to Modal to Cloudflare without rewriting your agent. You can mount storage from AWS S3, Google Cloud Storage, Azure Blob Storage, or Cloudflare R2 interchangeably. That is genuine infrastructure flexibility.

But the Manifest is portable across sandbox providers, not across model providers. The harness itself — the orchestration logic, the memory model, the way tools are surfaced and invoked — is optimized for how frontier OpenAI models operate. Forrester analyst William McKeon-White flagged this directly: “For larger organizations that want to remain a bit more provider agnostic, this is a less relevant update.”

The Assistants API sunsets on August 26, 2026. If you are still on Assistants, this SDK update is not optional — it is your forced migration path. Plan accordingly.

Why This Matters

Building sandbox execution infrastructure for production agents is genuinely miserable work. You need container orchestration, credential isolation, checkpoint persistence, filesystem abstraction, and failure recovery — all before you write a single line of agent logic. Every team I have talked to that runs agents in production has either built this from scratch (painful, 2-4 engineer-months) or bolted together E2B or Modal with custom glue code (less painful, still fragile).

OpenAI just made that entire layer a configuration problem. Define a Manifest, pick a sandbox provider, and the harness handles orchestration, snapshotting, and tool routing. For teams already committed to the OpenAI ecosystem, this is a legitimate 10x reduction in infrastructure overhead for agent deployment.

The subagent model pushes this further. Subagents can already be routed to isolated sandboxes and parallelized, which means multi-agent workflows — a research agent feeding results to a coding agent feeding outputs to a review agent — run in isolated environments with proper credential boundaries. This is the multi-agent execution layer OpenAI has been building toward since Swarm, and it is the most architecturally coherent version yet.

But here is where I start to worry. The harness is not a thin abstraction over generic agent capabilities. It is a model-native harness, which means its memory model, its tool invocation patterns, and its orchestration assumptions are tuned for GPT-5.x behavior. If you are building on Claude, Gemini, or open-weight models, the harness does not just fail to help — it is architecturally irrelevant to you. The skills primitive, the AGENTS.md pattern, the way shell and apply_patch are surfaced — these map to how OpenAI’s models expect to interact with execution environments. Anthropic’s Claude Agent SDK has its own runtime model that is fundamentally incompatible with this harness, and the same goes for any framework built around model-agnostic orchestration.

Compare this to framework-level alternatives like LangGraph or CrewAI, which are model-agnostic by design. They are less polished, require more configuration, and lack native sandbox integration — but they do not bind your orchestration layer to a single model provider. The tradeoff is explicit: you get a worse out-of-the-box experience in exchange for the ability to swap models without rewriting your agent infrastructure.

If you are evaluating this update, ask one question first: will you still be running exclusively on OpenAI models in 18 months? If the answer is “yes” or “probably,” adopt aggressively. If the answer is “maybe not,” the Manifest’s sandbox portability will not save you from harness-level lock-in.

The pricing model reinforces the adoption case for OpenAI-committed teams. GA availability at standard token and tool-use pricing with no custom procurement requirement means a three-person startup and a Fortune 500 team get the same infrastructure. No enterprise sales cycle, no minimum spend. That is a smart distribution play — it lowers the barrier to adoption precisely at the moment when the Assistants API deprecation forces migration.

Code mode and full TypeScript support are on the roadmap, along with expanded subagent capabilities. These are not yet available, but they signal where OpenAI is heading: a complete execution platform where agents live and run, not just a model API you call from your own infrastructure. The agent production stack is increasingly something OpenAI wants to own end-to-end.

The Take

I want to be honest: this update solves a problem I have personally spent weeks on — getting sandboxed execution right for agents that need to run code, write files, and interact with external storage without leaking credentials or losing state. The snapshotting alone would have saved me days of debugging container restart failures. The security architecture, with its harness/compute separation, is well-designed and addresses real attack surfaces that most agent frameworks ignore entirely.

But I keep coming back to the same question: would I be comfortable with OpenAI controlling the execution contract for my agents in 12 months? The Manifest lets me swap E2B for Modal. It does not let me swap GPT-5 for Claude without rearchitecting the harness layer. Every production agent I wire into this system is one more agent that gets harder to migrate. OpenAI is making the right architectural bet for teams already deep in their ecosystem — the Symphony announcement pointed in exactly this direction — while quietly making the exit door harder to find.

If you are already OpenAI-native, adopt this. It is the best agent execution infrastructure available today, and the August 26 Assistants API sunset means you are migrating anyway. But if you are running multi-model or think you might be in a year, wire in an abstraction layer above the harness now. The sandbox portability is real. The model portability is not. Plan for that asymmetry before it plans for you.