Your Containers Can't Hold an Agent: The Sandboxing Reality Every Architect Must Face
Here is a statement that should make every platform architect uncomfortable: standard Docker containers provide no meaningful isolation against AI agents that execute LLM-generated code.
Containers share the host kernel. Any kernel exploit, any misconfigured capability, any privileged syscall available to the agent can escape the container boundary. And according to HiddenLayer's 2026 AI Threat Landscape Report, 1 in 8 AI security breaches now involves an agentic system — 31% of organizations don't even know whether they experienced one.
The evidence from real CVEs in the first half of 2026 makes the argument for us. Let's walk through what happened, why containers failed, and what isolation actually looks like.
Three CVE Case Studies That Prove the Point
CVE-2026-5752 (CVSS 9.3): Cohere AI Terrarium Sandbox Escape
Cohere AI's Terrarium was a purpose-built Python sandbox designed to run untrusted code — exactly the kind of isolation layer you'd want for an AI agent executing generated code. It ran inside a Docker container on Pyodide, a Python distribution for WebAssembly.
Security researcher Jeremy Brown discovered that JavaScript prototype chain traversal in the Pyodide WebAssembly environment could reach back into the host Node.js process. From sandboxed Python code, an attacker could achieve root-level code execution within the container, read sensitive files like /etc/passwd, reach other services on the container's network, and potentially escape the container entirely.
The vulnerability scored 9.3 on CVSS. It required local access but no user interaction and no special privileges. And because the Terrarium project is no longer actively maintained, the vulnerability will never be patched.
The lesson: a sandbox that executes untrusted code inside a shared runtime — whether Node.js, Python, or a WebAssembly interpreter — exposes the runtime's internals as attack surface. Kernel-level isolation removes that attack surface entirely.
CVE-2025-59528 (CVSS 10.0): Flowise AI Agent Builder
Flowise's CustomMCP node parsed user-provided configuration strings and executed arbitrary JavaScript with direct access to Node.js child_process and fs modules. No sandbox. No capability restrictions. No process isolation.
Over 12,000 internet-facing Flowise instances were exposed when active exploitation began. The vulnerability was public for six months before anyone exploited it. A Starlink IP was documented in the attack traffic.
The root cause was disarmingly simple: user-provided code executed in the same process context as the application. A sandbox — even a basic one — would have contained the execution. But there wasn't one.
Google Antigravity: Secure Mode Bypass via Subprocess
Pillar Security disclosed a vulnerability in Google's Antigravity agentic IDE in January 2026. The find_by_name tool passed a Pattern parameter directly to the underlying fd binary without argument validation. An attacker injected -Xsh as the pattern, fd executed the staged script, and arbitrary code ran on the host.
Google's "Secure Mode" — its highest security configuration — was bypassed because the native tool invocation executed before the agent's security restrictions were evaluated. Application-level security controls cannot govern subprocesses once execution transfers to a native binary. Only kernel-level isolation — applied at the process boundary — can contain this attack class.
CVE-2025-59536 (CVSS 8.7): Claude Code Configuration Injection
Check Point Research disclosed that Anthropic's Claude Code CLI Hooks feature — shell commands that run at lifecycle events — could be exploited for configuration injection. A companion flaw (CVE-2026-21852, CVSS 5.3) allowed API key theft by redirecting Claude Code's API requests to an attacker-controlled proxy.
Both vulnerabilities exploited the execution context of an unsandboxed developer agent. Network egress controls would have blocked the API key theft. Configuration file write protection would have blocked the Hooks injection. Neither was present.
Why Standard Docker Containers Are Not Sandboxes
This is the point that bears repeating. Docker containers isolate the filesystem, the process namespace, and the network stack — but they share the host kernel with every other container and the host itself. A kernel vulnerability exploitable from inside the container is exploitable against the host.
For AI agents that execute LLM-generated code, this is not a theoretical concern. The agent may write files, spawn subprocesses, call system APIs, and interact with configuration files — all behaviors that container isolation alone does not constrain. As OWASP states in ASI05 (Unexpected Code Execution): "Never execute agent-generated code without strict sandboxing, input validation, and allowlisting."
The distinction between a container and a sandbox is not academic. It is the difference between process-level isolation (filesystem, network, PID namespace) and kernel-level isolation (separate kernel, separate memory, hardware-enforced boundaries).
The Three Isolation Technologies: What You Actually Need
Not every workload requires the same isolation guarantee. The choice depends on data classification, threat model, and performance requirements.
Firecracker microVMs: The Gold Standard
Firecracker, the AWS open-source technology written in Rust, creates lightweight virtual machines using KVM hardware virtualization. Each workload gets its own dedicated kernel — completely isolated from the host kernel. Escaping a Firecracker sandbox requires breaking out of both a guest kernel and the KVM hypervisor layer.
Performance is surprisingly good: approximately 125ms boot time, less than 5 MiB memory overhead per VM, support for up to 150 VMs per second per host. The attack surface is minimal by design: only five virtual device types are exposed.
Use Firecracker when: the agent executes LLM-generated code, handles regulated data (healthcare, financial services, EU AI Act high-risk systems), or operates in a multi-tenant environment where cross-tenant isolation is a contractual requirement.
gVisor: Syscall-Level Isolation
gVisor's userspace kernel (the Sentry, written in Go) intercepts approximately 70–80% of Linux syscalls before they reach the host kernel. Compromising the sandboxed application does not directly expose the host kernel.
The trade-off: I/O-heavy workloads see 10–30% overhead; compute-heavy workloads see minimal impact. Startup speed is comparable to containers.
Use gVisor for: compute-intensive AI workloads in Kubernetes where Firecracker's hypervisor overhead is unacceptable, and where stronger-than-container isolation is required.
V8 Isolates: Lightweight, Latency-Critical
V8 Isolates run independent JavaScript contexts within a single process. Startup time is in the microsecond range — ideal for latency-critical workloads. But they only support JavaScript and WebAssembly, and they provide process-level isolation, not kernel-level.
Use V8 Isolates for: lightweight agent tasks that execute JavaScript functions and never touch the host filesystem or spawn subprocesses. Do not use for general-purpose agentic workloads.
The Four Mandatory Isolation Layers
The 2026 consensus from OWASP, Microsoft (Agent Governance Toolkit), and NVIDIA converges on four independent isolation boundaries that must operate together:
1. Network Egress Control
An unsandboxed agent can call any endpoint its host can reach. A sandboxed agent operates under a tightly scoped allowlist. Define exactly which external APIs the agent is permitted to call. Enforce via an egress proxy or network policy. Alert on all other outbound traffic. This directly limits the impact of prompt injection attacks that attempt to exfiltrate data or call attacker-controlled endpoints.
The Claude Code CVE-2026-21852 API key theft relied on the agent redirecting requests to an attacker-controlled proxy. Network egress controls would have blocked that connection before it was established.
2. Filesystem Boundaries with Write Protection
An unsandboxed agent with write access to the filesystem can modify configuration files that execute automatically — dotfiles, hooks, MCP configuration directories. NVIDIA's 2026 guidance explicitly flags these as write-protected zones because they execute at startup or by developer tools before any runtime security check is evaluated.
The Claude Code Hooks injection vector exploited exactly this. The agent wrote to a configuration file that executed at the next lifecycle event. Filesystem write protection at the sandbox boundary would have prevented the write.
3. Process Isolation at the Kernel Level
When a sandboxed agent spawns a subprocess, that subprocess must remain inside the sandbox boundary. Application-level security policies cannot govern subprocesses spawned by native tool invocations. The Google Antigravity bypass demonstrated this directly: the fd binary executed before security checks were evaluated.
4. Secrets Scoping per Task
An unsandboxed agent that inherits the full host credential environment can access every API key, cloud role, and database connection string available to the process. A properly isolated agent receives only the credentials it needs for the specific task — provisioned at runtime, scoped to specific tools, and revoked when the task completes.
This is the operational translation of least privilege for agentic workloads. And it requires more than sandboxing — it requires credential management integrated with the agent runtime. Facio (the HITL-first agent runtime) addresses this by capturing every credential usage in its audit trail, making overprivileged access patterns visible at runtime rather than discoverable only after an incident.
The Broader Architecture: Sandboxing Is One Layer
Sandboxing is necessary. It is not sufficient. An agent that runs inside a Firecracker microVM with network egress controls and filesystem write protection can still be tricked into exfiltrating data through an approved API, calling a permitted tool with malicious parameters, or corrupting a database it has legitimate access to.
This is why the runtime governance layer matters. The sandbox constrains what the agent can reach. The runtime constrains what the agent can do — with deterministic policy enforcement, audit trail capture, and human review at high-impact decision points.
Placet.io (the HITL inbox and messenger) provides the human-facing side of this equation: when the agent reaches a permission boundary — attempting a tool invocation it hasn't used before, accessing a new data source, or changing its own configuration — the request triggers a structured human approval workflow with a complete audit record.
What to Do This Quarter
If your organization deploys AI agents that execute code — and according to HiddenLayer, 1 in 8 of you will experience an agentic breach — here is the immediate action plan:
-
Inventory every agent that executes code. LLM-generated, user-provided, configuration-driven — all of it. If you don't know which agents run untrusted code, you can't sandbox them.
-
Upgrade from containers to sandboxes. Standard Docker is not sufficient for agents that execute LLM-generated code. Evaluate Firecracker microVMs for regulated workloads, gVisor for Kubernetes-native compute.
-
Implement the four mandatory isolation layers. Network egress, filesystem boundaries, process isolation, and secrets scoping. These are not optional. OWASP, Microsoft, and NVIDIA all agree.
-
Add runtime governance on top of infrastructure isolation. Sandboxing prevents escape. Runtime governance — audit trails, policy enforcement, human review — prevents the agent from doing damage within its sandbox.
The organizations that treat containers as sandboxes will learn the difference through an incident. The ones that treat sandboxing as infrastructure — designed in, not bolted on — will be the ones whose agents scale without compromising the systems they run on.
Further reading: