Back to blog

Security · Jun 12, 2026

Zero Trust for AI Agents: The Three-Tier Framework Microsoft and Anthropic Both Converged On

Microsoft added "AI" as a fifth pillar to its Zero Trust Workshop with 700 controls. Anthropic published a three-tier framework the same week. CISA's ATF from CSA, the NSA's MCP guidance, and the OWASP Top 10 all point to the same answer. The convergence is the story.

Zero TrustAI SecurityMicrosoft ZT4AIAgentic Trust FrameworkSecurity Architecture

Zero Trust for AI Agents: The Three-Tier Framework Microsoft and Anthropic Both Converged On

Within ten days in March 2026, Microsoft and Anthropic independently published Zero Trust frameworks for AI agents. Microsoft announced Zero Trust for AI (ZT4AI) with a new AI pillar added to its Zero Trust Workshop, expanding the framework to 700 security controls across 116 logical groups. Anthropic released a three-tier Zero Trust framework (Foundation, Advanced, Optimized) for autonomous AI agents in the enterprise.

They were not coordinating. The convergence tells you something more important than either announcement alone: the security community has reached consensus on how to extend Zero Trust to autonomous systems.

The Cloud Security Alliance's Agentic Trust Framework reached the same conclusion from a different starting point. The NSA's MCP Security guidance arrived at structurally identical recommendations. The OWASP Agentic AI Top 10 names the same risks. NIST's AI Risk Management Framework prescribes the same controls. The frameworks differ in vocabulary, not architecture.

This post synthesizes the convergence: what Zero Trust for AI agents actually means, what the three maturity tiers look like, and what concrete patterns every deployment should implement now.

Why Zero Trust Principles Need New Shape for Agents

Zero Trust — trust nothing, verify everything, assume breach — was formulated for networks, identities, and workloads with predictable behavior. AI agents have properties that strain the original framework:

Autonomy at machine velocity. A human user authenticates, performs a small set of actions, and disconnects. An agent authenticates once, then performs thousands of actions in a session, including tool invocations, file operations, and cross-system calls. The unit of trust is no longer the session — it is the action.

Natural language as the instruction channel. Traditional Zero Trust evaluates structured requests with predictable formats. An agent processes natural language instructions, which can contain hidden payloads. A request that looks legitimate to the gateway may carry a prompt injection in the body. Authentication and request validation are no longer sufficient — the content must be evaluated against the action it is requesting.

Persisted context across sessions. A user's session context is short-lived. An agent's context can persist for hours or days, with memory, retrieved documents, and accumulated reasoning. Memory poisoning — covered in the Facio analysis from May 2026 — means that yesterday's trusted context may be today's attack vector.

Tool composition is dynamic. A human user has a fixed set of permissions. An agent dynamically composes tools into action sequences. The same agent may read a public document, summarize it, and forward the summary to a privileged tool. Static permission models cannot evaluate the resulting trust implications.

Multi-agent delegation. When an orchestrator delegates to specialized agents, each hop can propagate ambient permissions. The trust boundary shifts at every delegation, but the original session token often remains. Zero Trust must evaluate the delegation chain, not just the originating identity.

Microsoft's ZT4AI paper captures the implication: AI systems introduce "new trust boundaries — between users and agents, models and data, and humans and automated decision-making." What changes is the shape of the boundary, not the principle.

The Three Principles, Applied to AI

Microsoft's framing names three foundational principles that translate the original Zero Trust maxims to AI environments:

Verify explicitly — Continuously evaluate the identity and behavior of AI agents, workloads, and users. Not at session establishment. Continuously. The agent's authorization context must be re-evaluated for every tool invocation, not just every session.

Apply least privilege — Restrict access to models, prompts, plugins, and data sources to only what is needed. For agents, this means per-task credential scoping, not per-role. An agent that needs read access to a database for one task should not carry that access into the next.

Assume breach — Design AI systems to be resilient to prompt injection, data poisoning, and lateral movement. The agent may be compromised. The agent's input may be poisoned. The agent's tools may be malicious. The architecture must remain safe under all of these conditions.

These are the same principles Zero Trust has always asserted. What changes is the granularity at which they are applied: per-action verification, per-task privilege scoping, and per-input taint tracking.

The Three-Tier Framework: Foundation, Advanced, Optimized

Anthropic's framework, structured as a three-tier maturity model, gives organizations a practical roadmap. Each tier represents a coherent set of capabilities; most enterprises should aim to reach Advanced within 12 months and Optimized within 24.

Foundation Tier: The Minimum Viable Zero Trust for AI

The Foundation tier establishes the structural invariants that any AI agent deployment must have. Without these, the system is not Zero Trust at all — it is perimeter trust extended to an autonomous actor.

Cryptographic identity per agent. Every agent has a distinct, cryptographically verifiable identity. The identity is not shared with humans, not shared with other agents, and not derived from a static API key. Use workload identity standards (SPIFFE, mTLS, OAuth client credentials) for issuance.

Tool allowlisting with per-tool scope. The agent can only invoke tools that have been explicitly registered in its configuration. Each tool has a defined scope: read, write, specific operation types, specific resource types. Tools outside the allowlist are not reachable from the agent's execution context.

Egress controls with allowlisted destinations. The agent can only make network calls to destinations on an explicit allowlist. Egress is enforced at the network layer, not at the agent prompt. An agent that needs to call a new API must have the destination added to the allowlist by a human or by an automated policy that has been previously approved.

Per-session audit trail. Every tool invocation, every network call, every model call is recorded in an immutable audit trail. The trail is queryable for incident response, retained per regulatory requirements, and accessible to the security operations team.

Sandboxing for code execution. If the agent executes LLM-generated code, the execution must happen inside a sandbox that constrains the operation set. Use Firecracker microVMs, gVisor, or equivalent — not standard Docker containers.

Input taint marking. Content from untrusted sources (public web pages, GitHub issues, email bodies, document attachments) is marked as untrusted. The mark propagates through any transformation the agent performs. An untrusted input that reaches a privileged tool call triggers a policy decision, not a silent authorization.

This tier eliminates the most common attack classes: credential theft (the agent has no static credentials), unauthorized tool access (the allowlist is enforced), data exfiltration via unauthorized destinations (egress controls), and persistent compromise (the sandbox contains code execution).

Advanced Tier: Runtime Governance and Continuous Verification

The Advanced tier adds continuous verification, runtime policy enforcement, and human review at high-blast-radius decision points. Most enterprises should reach this tier before deploying agents that handle regulated data or operate across multiple systems of record.

Runtime policy engine. Every tool invocation passes through a policy evaluation that considers: the current task, the agent's identity, the resource being accessed, the data classification, the time of day, the user's authorization, and the agent's recent behavior. Policies are expressed as ABAC rules, not as static allowlists. The engine evaluates the policy in the execution path, not as a pre-flight check.

Per-task credential issuance. The agent receives credentials scoped to the specific task, the specific resources, and the specific time window. When the task ends, the credentials expire. The blast radius of a compromised credential shrinks from "everything the agent has ever been authorized to do" to "one task, one scope, one time window."

Circuit breakers with configurable thresholds. Volume, cost, and blast-radius thresholds trigger automatic throttling or human review. An agent that makes more than N database calls per minute, or spends more than $X per task, or accesses resources classified above its current task's data sensitivity, triggers a human approval workflow. Placet.io (the HITL inbox and messenger) delivers these approval requests to the right reviewers with full context.

Behavioral anomaly detection. Baseline the agent's normal behavior — tool usage patterns, data access patterns, response latencies, token consumption per task. Detect deviations that may indicate prompt injection, memory poisoning, or compromise. Anomaly detection runs continuously, not only at model update boundaries.

Prompt injection defense in depth. Layered defenses: input validation at the model gateway, content provenance tracking, output sanitization, and tool-level checks for instruction-like content. No single layer is sufficient; the defense is the layering.

Per-agent memory isolation. Each agent's persistent memory (long-term context, learned preferences, retrieved knowledge) is isolated from other agents. Memory poisoning in one agent does not propagate to other agents. Memory contents are versioned, and any read access is logged.

The Advanced tier addresses the attack classes that escape Foundation-tier controls: sophisticated prompt injection that bypasses input validation, gradual drift in agent behavior that signals compromise, and runtime policy violations that static allowlists cannot anticipate.

Optimized Tier: AI-Native Defensive Operations

The Optimized tier acknowledges that defenders and attackers are both using AI, and that defensive operations must run at the speed of autonomous attackers. This is the Agentic SOAR vision: the SOC itself becomes agentic, correlating signals, triaging alerts, and executing containment actions at machine velocity.

Agentic security operations (Agentic SOAR). Autonomous tier-1 triage, automated correlation of agent audit signals with broader SIEM telemetry, and machine-speed response to detected threats. The security operations center uses AI agents to investigate, contain, and remediate — operating at the same speed as the AI-accelerated attackers.

Continuous red-teaming of deployed agents. Automated adversarial testing of production agents, scheduled at the cadence of the deployment pipeline. Each model update, tool integration, or configuration change triggers a new round of attacks. The agent is hardened continuously, not at release boundaries.

Federated governance across agent fleets. A central governance plane enforces policies across all agent deployments. New tools, new data sources, and new agent types are registered centrally. Policy updates propagate automatically. Audit trails are aggregated for compliance and forensic purposes.

Hardware-rooted identity for high-trust agents. For agents that operate on the most sensitive data or take the most consequential actions, identity is rooted in hardware (TPM, secure enclaves) rather than in software credentials. The trust assumption is that the identity cannot be exfiltrated because it never leaves the hardware.

Continuous proof-of-compliance. Compliance evidence is generated continuously from the audit trail, not assembled periodically for audits. Every regulatory requirement maps to a set of audit events, and the system can demonstrate compliance to a regulator in real time, not retrospectively.

The Optimized tier is achievable in 2026 for the most mature organizations — the financial services leaders, the defense contractors, the hyperscale cloud providers. For most enterprises, the target is Advanced tier for the next 12–24 months, with Optimized as the architectural direction.

The Eight-Phase Implementation Workflow

Anthropic's framework includes an eight-phase workflow that translates the maturity tiers into a sequence of implementation steps:

  1. Identity — Establish cryptographic identity for every agent. Phase 1 of Foundation tier.
  2. Access scoping — Define per-task permission scopes. Phase 2 of Foundation tier.
  3. Sandboxing — Constrain code execution. Phase 3 of Foundation tier.
  4. Input and output controls — Implement prompt injection defense in depth. Phase 4 of Foundation tier.
  5. Memory safeguards — Isolate, version, and audit agent memory. Phase 5 of Foundation tier.
  6. Egress and tool allowlisting — Restrict destinations and operations. Phase 6 of Foundation tier.
  7. Runtime policy engine — Implement ABAC evaluation in the execution path. Phase 1 of Advanced tier.
  8. Behavioral monitoring and human review — Add continuous verification and HITL approval at decision boundaries. Phase 2 of Advanced tier.

This sequence is not arbitrary. Earlier phases establish the structural invariants; later phases assume those invariants are in place. Implementing monitoring (phase 8) before identity (phase 1) produces a monitoring layer that cannot reliably attribute actions to specific agents. Implementing the policy engine (phase 7) before access scoping (phase 2) produces a policy engine that has no meaningful scopes to evaluate.

The CSA's Agentic Trust Framework reaches the same sequence, structured as a governance specification rather than an implementation workflow. NIST's AI RMF names the same controls. The OWASP Agentic AI Top 10 lists the same risks.

The Convergence Is the Signal

The frameworks differ in vocabulary: Microsoft uses "AI pillar" and "verify explicitly," Anthropic uses "three tiers" and "Agentic SOAR," CSA uses "ATF components," NSA uses "five-layer threat model." The terminology varies. The architecture does not.

When independent organizations arrive at structurally identical recommendations from different starting points and within the same time window, the convergence is the message. The agentic security community is not divided on what to do. It is divided only on naming conventions.

This matters for enterprise security leaders. The risk of choosing the "wrong" framework is lower than it appears. Whichever framework an organization adopts, the implementation will be largely the same. The decision is not architectural — it is which vocabulary to use internally and which vendor to align with operationally.

Where Facio Fits

The framework patterns described here are not theoretical. Facio (the HITL-first agent runtime) implements the Foundation and Advanced tier capabilities at the platform level:

  • Per-agent cryptographic identity with workload identity standards, distinct from human users
  • Tool allowlisting and per-tool scope enforced at the execution boundary
  • Egress controls with destination allowlists applied at the network layer
  • Runtime audit trail with full provenance, tamper-evident by design
  • Runtime policy engine with ABAC evaluation in the execution path — every tool invocation is policy-checked before execution
  • Per-task credential issuance scoped to specific resources, operations, and time windows
  • Circuit breakers with configurable thresholds for volume, cost, and blast radius
  • HITL approval workflows delivered through Placet.io when an action crosses a configurable boundary

The convergence of Microsoft's ZT4AI, Anthropic's framework, CSA's ATF, and the NSA's MCP guidance provides the architectural direction. Facio provides the runtime that implements it.

The Bottom Line

Zero Trust for AI agents is not a new security discipline. It is Zero Trust applied to a new class of actor — one that authenticates once and acts thousands of times, that processes natural language as both instruction and data, and that operates across system boundaries without the friction of human workflows. The principles are the same: verify explicitly, apply least privilege, assume breach. The granularity is finer: per-action, per-task, per-input.

The organizations that will operate AI agents securely are the ones that treat the convergence of Microsoft's, Anthropic's, CSA's, and NSA's frameworks as a clear signal of where the architecture is going — and start building toward Foundation tier now, Advanced tier within 12 months, and Optimized tier as the long-term direction.

The frameworks are aligned. The vendors are aligned. The only question is whether your deployment is.


Further reading: