When Prompts Become Shells: The RCE Threat Hiding in AI Agent Frameworks
In May 2026, Microsoft's Security Response Center published research that should reframe how every enterprise thinks about AI agent security. The finding was stark: a single prompt can trigger host-level remote code execution in popular AI agent frameworks — with no browser exploit, no malicious attachment, no memory corruption bug required. The agent simply did what it was designed to do: interpret natural language, choose a tool, and pass parameters into code.
This is not a hypothetical. Microsoft disclosed two critical CVEs in its own Semantic Kernel framework — one enabling arbitrary code execution through vector store filtering, the other enabling sandbox escape through an exposed file-write tool. Both have been patched. But the attack pattern they reveal is systemic, and it applies to every framework that wires LLM output to system-level tool invocation.
The Fundamental Problem: LLMs Are Not Input Validators
AI agent frameworks — Semantic Kernel, LangChain, CrewAI, and others — act as the operating system for AI agents. They map natural-language intent to structured tool calls. The model is not the problem: it parses language into parameter schemas exactly as designed. The vulnerability lies in how the framework and tools trust the parsed data.
Consider what happens when an agent invokes a tool like search_hotels(city="Paris"). Behind the scenes, the framework constructs a filter function from the model-supplied parameter. If that parameter is not sanitized — and the framework evaluates it as executable code — then a prompt like:
' or __import__('os').system('malicious_command') or '
...can escalate from a simple search query to arbitrary shell execution. The LLM didn't decide to be malicious. It faithfully parsed what the attacker told it to parse. The framework then executed controlled input as code.
CVE-2026-26030: When Vector Search Becomes Code Execution
The first vulnerability Microsoft disclosed affected Semantic Kernel's In-Memory Vector Store. The attack chain was disarmingly simple:
Step 1: The agent exposes a search plugin backed by a vector store with a default filter function. That filter is implemented as a Python lambda expression passed to eval().
Step 2: An attacker injects a prompt that manipulates the agent into calling the search plugin with a crafted parameter. The framework formats the parameter directly into the lambda string — no sanitization.
Step 3: A blocklist-based validator checks the resulting expression for dangerous identifiers (eval, exec, open, __import__). But the attacker's payload avoids every blocked name by traversing Python's type system through unblocked attributes like __name__, __subclasses__, and BuiltinImporter.
Step 4: The lambda executes. The attacker's code runs on the host. calc.exe launches. Or worse.
The root cause wasn't a single oversight — it was a defense architecture problem. Blocklists in dynamic languages are inherently fragile. An allowlist-based AST validator, combined with eliminating eval() from the execution path entirely, would have closed the vector at the design stage.
CVE-2026-25592: Sandbox Escape Through Exposed Tool Metadata
The second vulnerability was even more instructive about how framework design choices create attack surface.
Semantic Kernel's SessionsPythonPlugin runs Python code in an isolated Azure Container Apps sandbox. The security boundary is the container. But the plugin also has helper functions — UploadFile and DownloadFile — that transfer files between the sandbox and the host.
In the .NET SDK, DownloadFileAsync was accidentally decorated with a [KernelFunction] attribute. This advertised it to the AI model as a callable tool, complete with a parameter schema that included localFilePath. Suddenly, the model could decide where files get written on the host. No path validation. No directory restriction. No sanitization.
The attack chain: prompt the agent to write a malicious payload to the Windows Startup folder via the exposed DownloadFile tool. Reboot. The attacker has persistence on the host — having never touched the sandbox's security boundary.
The fix was straightforward: remove the [KernelFunction] attribute from internal helper functions. But the lesson is broader: every function your framework exposes to the model is part of the attack surface. Exposing internal utilities as callable tools is the agent equivalent of leaving a debug endpoint open in production.
Tool Poisoning: The Attack Surface Beyond Prompt Injection
The RCE vulnerabilities in agent frameworks share a common ancestor with an emerging attack class: tool poisoning. While prompt injection targets the user-facing input channel, tool poisoning targets the metadata that describes what tools do — their names, descriptions, and parameter schemas.
Security researchers at OX Security disclosed in May 2026 that this attack surface spans more than 150 million MCP downloads and an estimated 200,000 vulnerable instances. The MCPTox benchmark, testing 45 live MCP servers against modern LLMs, found that tool poisoning success rates exceeded 60% across major agents — with some models compromised on 72% of attempts.
What makes tool poisoning especially dangerous is its persistence. A poisoned tool description ships inside a package, a configuration file, or a remote MCP server. It works on every invocation, silently, across every session, for every user — until somebody notices. The user sees a legitimate response. The data has already moved.
The attack classes form a hierarchy:
| Attack Class | Payload Location | Persistence | Primary Mitigation |
|---|---|---|---|
| Direct prompt injection | User input | Per-session | Input filtering, system prompt isolation |
| Indirect prompt injection | Retrieved content (docs, web) | Content lifetime | Provenance tracking, content sandboxing |
| Tool poisoning | Tool metadata, descriptions | Until tool removed | Tool allowlisting, signed manifests, runtime monitoring |
| Framework RCE | Framework internals (eval, file ops) | Until patched | Framework hardening, least-privilege execution |
What Enterprise Teams Should Do Now
The CVEs are patched — Semantic Kernel 1.39.4 and later include fixes for both vulnerabilities. But patching answers only the immediate question. The harder questions are architectural:
1. Audit Your Agent's Execution Path
If your agent uses eval(), exec(), or any form of dynamic code evaluation with model-supplied parameters, you have a latent RCE risk. Replace dynamic evaluation with deterministic dispatch — the agent should select from a finite set of pre-approved operations, not construct executable code at runtime.
2. Treat Tool Exposure as Privilege Escalation Surface
Every function decorated with [KernelFunction], @tool, or equivalent is a potential attack vector. Internal utility functions — file operations, network calls, process management — must never be exposed to the model. Maintain a strict allowlist of callable tools, and audit it regularly.
3. Implement Runtime Monitoring — Not Just Pre-Deployment Scanning
Both CVEs required the attacker to influence agent behavior at runtime. Pre-deployment code review would have caught the exposed DownloadFile function, but the vector store RCE required understanding the runtime interaction between prompt parsing and filter evaluation. Facio (the HITL-first agent runtime) addresses this directly: its audit trail captures every tool invocation with full parameter traceability, making anomalous tool calls visible — not just the ones you anticipated.
4. Add Human Review for High-Impact Tool Calls
The sandbox escape CVE succeeded because no human reviewed the file-write operation before it executed. File writes to system directories, outbound network connections from agent processes, and execution of dynamically constructed commands should require approval. Placet.io (the HITL inbox and messenger) provides the human-facing side of this equation — delivering structured approval requests to the right reviewers through the channels they already use.
5. Hunt for Past Exploitation
As Microsoft's advisory notes: patching closes the bug, but it doesn't answer whether your agent was exploited before you upgraded. Define your vulnerable window — from the moment a vulnerable framework version was deployed until the moment it was patched. Hunt for host-level post-exploitation signals during that window: suspicious child processes spawned by the agent host, unexpected outbound connections, persistence artifacts in startup directories.
The Bigger Picture: Governance in the Execution Path
The RCE vulnerabilities in agent frameworks are not anomalies — they are the first wave of a new vulnerability class. As agents gain more tools, more autonomy, and more privileged access, the attack surface expands with every integration. The OWASP Agentic AI Top 10 published in December 2025 gave us the taxonomy. The CVEs of May 2026 gave us the proof.
The organizations that will manage this risk are not the ones with the most sophisticated prompt filters. They are the ones that govern agent actions in the execution path — with deterministic policy enforcement, comprehensive audit trails, and human review checkpoints at high-impact decision points. The alternative is learning about a compromise from a regulator, not from a detection system.
Further reading: