From Prototype to Production: Facio's Readiness Checklist for AI Agents That Actually Ship
A working AI agent prototype is not a production AI agent. The gap between "it works on my machine" and "it works for 247 enterprise customers on a Monday morning" is where most agent projects die. The prototype runs the happy path; production runs everything else.
Facio's architecture is built around a production readiness checklist — six pillars that turn a clever demo into a reliable system. The pillars aren't aspirational features. They're operational requirements that production agents either have or don't. Here's what they are, why they matter, and how Facio addresses each.
Pillar 1: Audit Trails
The requirement: Every agent action is recorded with enough context to reconstruct what happened, when, and why. The audit trail is queryable, immutable, and survives across sessions.
Why it matters: Compliance auditors ask "what did your agent do?" Your CSM asks "why did the customer get that response?" Your developer asks "what changed since yesterday?" Without an audit trail, every answer is a guess. With one, every answer is a query.
How Facio addresses it:
- Tool-level logging. Every
execcall, everymanage_mcpoperation, every credential access, every file change — logged with timestamp, caller, parameters, and result. - HITL decision records. Every approval, rejection, and form submission is captured with the human's identity and the decision rationale.
- Read access to logs. The agent itself can query its own audit trail via
read_logs, supporting self-diagnosis and incident reconstruction. - Append-only architecture. Audit entries are never modified or deleted. The integrity of the record is structural, not procedural.
The audit trail isn't a feature you bolt on later. It's a property of the runtime that has to be designed in from day one. Agents that don't have it can't ship to regulated industries — full stop.
Pillar 2: HITL Gating
The requirement: Destructive actions require explicit human approval. Non-destructive actions run immediately. The agent knows the difference and asks when it should.
Why it matters: An agent that runs rm -rf on production data because the prompt said "clean up" is a liability. An agent that requires a human to approve every file read is a usability disaster. The gating has to match the risk — and the human has to trust the agent's autonomy within the gated boundaries.
How Facio addresses it:
- Tiered risk model. Read-only tools (
read_file,web_search) run immediately. Local modifications (write_file,edit_file) run with workspace-scope checks. System effects (exec,message) require context-appropriate approval. Infrastructure changes (manage_mcp,manage_credentials) gate destructive operations behindask_approval. - Structured approval requests.
ask_approvaldelivers a card with title, description, and custom options.ask_formcaptures structured input.ask_selectionpresents alternatives. The human reviews in Placet, their normal messaging interface. - Timeout-aware gating. If the human doesn't respond within the timeout, the agent knows — and decides whether to proceed, retry, or abort based on the request's risk profile.
- Audit trail integration. Every approval, rejection, and timeout is recorded alongside the action it gated.
HITL gating isn't about slowing the agent down. It's about ensuring the agent's autonomy operates within human-acceptable boundaries. The production-ready agent knows when to ask and when to act.
Pillar 3: Observability
The requirement: The agent's operation is visible in real time. Tool calls, decisions, errors, and outputs are queryable. When something goes wrong, the operator can see exactly where.
Why it matters: Production agents run 24/7. They interact with real users, real systems, real money. When the system slows down, when an error rate climbs, when a user complains, the operator needs answers in minutes — not hours. Without observability, the operator is guessing. With it, they're investigating.
How Facio addresses it:
- Structured logs. Every tool call produces a log entry with severity, timestamp, and context. The
read_logstool gives the agent itself access; the same data is queryable by operators. - Token usage tracking. Every session tracks token consumption against the budget. The agent sees the remaining iterations; operators see the trend over time.
- Heartbeat-based health checks. A scheduled
heartbeattask can probe agent health, verify MCP server connectivity, and report system status — without human intervention. - Error pattern detection. Operators can query the log for error patterns (
grep for "ECONNREFUSED"over 7 days) and identify systemic issues.
Observability is the difference between "the agent is broken" and "the agent is failing on 3% of mcp.weather.com calls because the upstream API is rate-limiting us." The first is a complaint. The second is a fix.
Pillar 4: Secret Management
The requirement: API keys, database passwords, OAuth tokens, and other secrets never appear in agent context, log entries, or written files. The agent operates on secrets without being able to read them.
Why it matters: A leaked secret in a log entry is a production incident. A secret in the agent's context window is a prompt injection target. A secret in a committed file is a breach. Production agents handle secrets in a way that makes leakage architecturally impossible, not just procedurally discouraged.
How Facio addresses it:
- Credential store isolation. API keys are stored in a credential store that the agent can call against but cannot read. The agent references
${credentials.OPENAI_API_KEY}in configurations; the runtime resolves the placeholder at use time. - No raw secrets in context. Even when the agent configures a tool that uses a secret, the secret value is never in the agent's input or output. The runtime injects it at the right layer.
- No raw secrets in files.
write_fileandedit_fileresolve placeholders at write time. The file is correct, the secret never appears in the agent's reasoning, and the audit trail shows "agent created.envwith credential references" — not the key value. - Scoped access. Some secrets can be scoped to specific tools or workflows. The agent can have access to a database password for a specific migration without having access to the same database password for arbitrary queries.
The secret management pillar is the difference between "our agent worked with a test API key" and "our agent works with production credentials in a way that no human or attacker can extract them."
Pillar 5: Error Recovery
The requirement: When a tool fails, when an MCP server goes down, when an external API returns 500, the agent detects the failure, classifies it, and recovers. The user experience degrades gracefully — not catastrophically.
Why it matters: Production systems fail. Production agents that don't recover from failures are outages. Production agents that recover become the operations team — they detect, diagnose, and fix, often before a human notices.
How Facio addresses it:
- Structured error responses. Every tool returns errors in a structured format the agent can parse. The agent reads the error type, message, and remediation suggestions.
- Self-diagnosis via
read_logs. The agent queries its own logs to understand failure patterns, cross-references with previous incidents, and applies lessons from past recoveries. - Fallback chains. When a tool fails, the agent can attempt an alternative: a different MCP server, a different model, a different approach. The fallback chain is defined by the agent, not hardcoded.
- Heartbeat-driven retry. A failed job scheduled in
HEARTBEAT.mdpersists in the task list. The next heartbeat tick re-attempts the work. The agent self-heals without a human's intervention. - HITL for novel failures. When the agent can't recover automatically, it escalates via
ask_approval— "I've tried three approaches and they're all failing. How should I proceed?"
Error recovery is what separates an agent that fails occasionally from an agent that fails gracefully. The first is unreliable. The second is mature.
Pillar 6: Cost Control
The requirement: Token usage, model costs, and external API expenses are bounded by design. The agent operates within a budget. Cost overruns are detected and prevented, not just observed.
Why it matters: An agent that accidentally loops 10,000 times on a stuck task can burn hundreds of dollars in tokens. An agent that uses a flagship model for every task can run a monthly bill into the thousands. Production agents have predictable costs — and cost overruns are operator-visible, not buried in a monthly bill.
How Facio addresses it:
- Per-session iteration budget. Each session has a configured limit on tool calls (default 50). The agent sees the budget remaining and learns to be efficient as it approaches the limit.
- Model routing for cost.
switch_modellets the agent route simple tasks to cheap models and complex tasks to flagship models. The cost is proportional to the value. - Output truncation.
exectruncates output at 10,000 characters.web_fetchcaps at 50,000. The agent's context window can't be accidentally flooded by a verbose command. - Token-aware decisions. The agent sees token usage in its context. A task that's "almost done" gets wrapped up. A task that's "going to need many more iterations" gets escalated to a human or simplified.
Cost control is a discipline the agent practices, not a constraint the runtime imposes. The agent that learns to be token-efficient is the agent that scales.
The Production Readiness Audit
Before shipping an agent to production, walk through this checklist:
[ ] Audit trail
[ ] All tool calls logged
[ ] HITL decisions recorded
[ ] Agent can query its own logs
[ ] Logs are queryable by operators
[ ] Logs are append-only
[ ] HITL gating
[ ] Destructive operations require approval
[ ] Approval requests have clear titles and descriptions
[ ] Timeouts are configured
[ ] Agent knows when to ask vs. when to act
[ ] Human escalation paths are tested
[ ] Observability
[ ] Token usage tracked
[ ] Error rates visible
[ ] Heartbeat health checks scheduled
[ ] Failure patterns detectable
[ ] Operator dashboard exists (or is buildable from logs)
[ ] Secret management
[ ] No secrets in agent context
[ ] No secrets in committed files
[ ] No secrets in log entries
[ ] ${credentials.KEY} placeholders used for all secret references
[ ] Credential store access is audited
[ ] Error recovery
[ ] Agent detects and classifies failures
[ ] Fallback chains defined for critical tools
[ ] Heartbeat retries failed jobs
[ ] Escalation path to human is configured
[ ] Past incident learnings are in MEMORY.md
[ ] Cost control
[ ] Per-session iteration budget set
[ ] Model routing configured for cost
[ ] Output truncation in place
[ ] Token usage visible to operator
[ ] Monthly cost projection is acceptable
If any item is unchecked, the agent isn't production-ready. The pillars aren't aspirational. They're operational requirements that production agents either have or don't.
What "Production" Actually Means
Production isn't a destination — it's a state. The agent that works today may not work tomorrow if the upstream API changes, the model provider has an outage, or the user's expectations shift. Production readiness is the architecture's ability to absorb these changes without a full rewrite.
Facio's six pillars are designed to be the foundation that survives change:
- Audit trails survive model swaps — the log format is independent of which LLM is running.
- HITL gating survives workflow changes — new tools fit into the existing tier model.
- Observability survives failure modes — new error types show up in the same logs.
- Secret management survives team changes — new secrets follow the same placeholders.
- Error recovery survives unknown unknowns — the self-diagnosis loop generalizes.
- Cost control survives scale — the budget model holds whether you have 1 user or 1,000.
An agent built on these pillars is an agent that can ship — and that can keep shipping as the world around it changes.
Bottom Line
The distance between "AI agent prototype" and "AI agent in production" is the distance between "showed it works once" and "works reliably for real users on real systems." That distance is not crossed by better prompts or smarter models. It's crossed by operational maturity — audit trails, HITL gating, observability, secret management, error recovery, and cost control.
Facio's architecture is built around these six pillars from the ground up. The tools are designed to be logged, gated, observed, isolated, recovered, and budgeted. Agents built on Facio inherit the production properties of the runtime — without the agent developer having to reimplement them for every project.
Because a clever demo isn't a product. A reliable system is.
See the production readiness documentation for the full checklist, configuration guides, and case studies from production deployments.