Facio's Iteration Budget: How Bounded Reasoning Stops AI Agents From Spiraling Into Costly Loops
An AI agent without a budget is a financial accident waiting to happen. A clever agent that gets stuck in a retry loop, asks the same question 200 times, or follows a confused thread of reasoning for hours can burn thousands of dollars in tokens before a human notices. The agent doesn't know it's stuck. The reasoning still feels productive from the inside. The cost accumulates from the outside.
Facio's iteration budget is a runtime-enforced bound on reasoning length. The architectural pressure that turns "keep trying forever" into "be efficient and escalate when stuck." Here's how bounded reasoning works, why it matters, and how the agent uses the budget as a forcing function for good behavior.
What the Iteration Budget Is
Every Facio session has an iteration budget — a hard cap on the number of tool calls the agent can make before the session terminates. The default is 50 iterations per session. The agent sees the budget remaining in its context, decrements with every tool call, and adapts its behavior as the budget runs low.
The budget is enforced at the runtime level, not in the agent's prompt. The agent can't reason its way out of it. The agent can't "decide" to exceed it. The runtime terminates the session when the iteration counter hits zero.
This is the structural difference between "the agent should be efficient" (a prompt instruction, easy to ignore) and "the agent must be efficient" (a runtime constraint, impossible to bypass).
# Agent's context window
# Iteration Budget: 50 remaining
# Calls this session: 0
# ... agent does work ...
# Iteration Budget: 23 remaining
# Calls this session: 27
# ... agent approaches limit ...
# Iteration Budget: 4 remaining
# Calls this session: 46
# Agent: "I need to wrap up. Let me consolidate findings and deliver."
The agent doesn't just see the number. It learns to interpret the trajectory. Approaching zero means the agent shifts from "explore and try alternatives" to "consolidate and deliver." The behavior change is the architectural intent.
Why Bounded Reasoning Matters
Production AI agents operate in an environment where unbounded reasoning is dangerous:
- Financial cost. Every iteration consumes tokens. A 50-iteration session on a flagship model can cost $5-50. A 1,000-iteration retry loop on the same model can cost $100-1,000. The budget is the upper bound on session cost.
- Latency cost. Each iteration adds latency. A 50-iteration session completes in minutes. A 500-iteration session takes hours. The user is waiting.
- Context degradation. As the agent's context window fills with tool results, reasoning quality degrades. The agent loses the thread, repeats itself, contradicts earlier conclusions. The budget forces synthesis before degradation.
- Operational safety. An agent stuck in a loop is consuming compute resources, potentially blocking other users, and possibly making external API calls that the customer is paying for. The budget is an operational safety valve.
The bound isn't an arbitrary limit. It's the structural reason production agents behave differently from prototype agents. The prototype can "try a few more things." The production agent has to make every iteration count.
The Behavioral Pressure: How the Budget Changes Agent Behavior
An agent that knows it has 50 iterations behaves fundamentally differently from one that thinks it has unlimited iterations:
Iteration 0-30: Exploration mode. The agent explores the problem, gathers information, considers alternatives, tries different approaches. Token efficiency matters but isn't urgent. The agent is gathering data.
Iteration 30-45: Convergence mode. The agent has enough information to converge on an answer. It stops exploring, starts consolidating. It chooses between alternatives instead of enumerating them. The cost of a wrong choice is rising; the cost of another iteration is rising too.
Iteration 45-50: Delivery mode. The agent is in the final stretch. It delivers the result, summarizes the findings, and exits cleanly. No more exploration, no more alternatives, no more refinement. The next iteration might be the last; the agent treats it as such.
This progression isn't a feature the agent is told to implement. It emerges from the budget constraint. The agent that uses 45 iterations on exploration has 5 left for everything else. The agent that uses 10 iterations on exploration has 40 left. The math incentivizes efficiency without the agent being told to be efficient.
The Escalation Pattern: When the Budget Isn't Enough
Sometimes 50 iterations genuinely isn't enough. A complex research task, a multi-stage deployment, a comprehensive analysis. The agent hits the budget and the work isn't done. What then?
Facio's design accommodates this through the heartbeat and cron pattern:
# Iteration Budget: 0 remaining
# Session terminated
# Heartbeat task in HEARTBEAT.md:
# [ ] Continue analysis (carry over from previous session)
The agent writes a heartbeat task that resumes the work on the next tick. The work continues in a new session with a fresh budget. The state is persisted to MEMORY.md and the workspace; the next session picks up where the previous one left off.
This is the architectural pattern for long-running work:
- Session 1 (budget: 50). Research phase. Produces 5 pages of analysis. Approaches budget, writes state to MEMORY.md.
- Session 2 (budget: 50). Continues from where session 1 stopped. Drafts the report. Approaches budget, schedules heartbeat to continue.
- Session 3 (budget: 50). Finalizes the report, gets human review via ask_approval. Delivers.
Three sessions, 150 total iterations, none of them exceeding the budget. The work completes because the budget is per-session, not per-project. The agent escalates to itself via heartbeat, not to a human.
For work that genuinely requires a single very long session, the budget is configurable. A complex deployment pipeline might warrant 200 iterations in a single session. A real-time customer service interaction might warrant 10. The customer sets the budget based on the workload.
The Failure Mode: What the Budget Prevents
Unbounded reasoning has a recognizable failure mode: the agent gets stuck. The same reasoning loops back, tries the same approach, produces the same result, encounters the same error, and tries again. Without a budget, this loop runs until the user kills the session — or the customer's bill arrives.
Common stuck patterns the budget prevents:
The Retry Loop. A tool call returns an error. The agent retries. The error persists. The agent retries. The budget forces the agent to either fix the root cause, try a fundamentally different approach, or escalate to the human — within a bounded number of attempts.
The Over-Exploration. The agent keeps researching when it has enough information. The budget forces synthesis: at iteration 40, the agent has to commit to an answer, not enumerate the 51st source.
The Refinement Spiral. The agent keeps refining the same output, each version slightly different, each iteration consuming tokens for marginal improvement. The budget forces "good enough" at iteration 45.
The Branching Explosion. The agent enumerates alternatives without converging. The budget forces selection. At iteration 50, the agent has to pick one — not list twelve.
These failure modes aren't hypothetical. They're what every unbounded LLM agent does eventually. The budget is the structural reason Facio agents don't.
The Tradeoff: When the Budget Is Too Tight
The budget isn't free. A 50-iteration session can be too short for genuinely complex work. The tradeoff is real:
- Too few iterations (e.g., 10). The agent can't complete complex tasks. It escalates to heartbeat for every workflow. The user experience degrades.
- Too many iterations (e.g., 500). The agent can get stuck. The cost and latency risk is real. The bounded-reasoning discipline is weakened.
- Just right (e.g., 50-100 for most workflows). The agent has room to explore, converge, and deliver. The cost is bounded. The latency is acceptable.
The right budget depends on the workload. Facio's default (50) is calibrated for typical agent workflows: research, content creation, deployment, monitoring. Workloads that consistently need more should either be split across sessions via heartbeat, or have their budget reconfigured upward with explicit justification.
Production Patterns Enabled by the Budget
Pattern 1: Cost-Visible Agent Operations
A team running Facio agents can predict session cost:
Session cost = iterations × tokens-per-iteration × cost-per-token
= 50 × 8,000 × $0.000015
= $6.00
This isn't an estimate. It's a bound. The session can cost at most $6.00, and in practice (since not all iterations are flagship-model calls) it usually costs less. The CFO can model monthly agent spend as a function of expected sessions — not as a function of agent behavior uncertainty.
Pattern 2: Latency-Predictable Response Times
A 50-iteration session completes in 2-5 minutes for typical workloads. The user knows the agent will respond in that window. A 200-iteration session might take 20 minutes. The user knows that too. Latency becomes a function of the budget, not of the agent's internal reasoning.
Pattern 3: Predictable Escalation
When the agent hits the budget, the escalation pattern is consistent: heartbeat task, next session, fresh budget. The user doesn't have to wonder "what will the agent do when it runs out of time?" The answer is the same every time.
Pattern 4: Cost-Aware Agent Design
The agent that knows it has 50 iterations will be designed to be efficient. The system prompts are calibrated to deliver within budget. The tool choices favor fast, token-efficient operations. The reasoning style is concise. The budget shapes the agent's design.
How the Budget Composes with Other Facio Features
The iteration budget is one of several cost and reliability controls in Facio's architecture:
| Control | What it bounds | How it composes with the budget |
|---|---|---|
| Iteration budget | Tool calls per session | The outer bound on session length |
| Token budget | Tokens consumed per session | The outer bound on session cost (Facio's runtime exposes token usage to the agent) |
exec timeout | Single command duration | Bounds per-iteration latency; long-running commands don't blow the budget |
exec output truncation | Output size per call | Bounds per-iteration context; verbose output doesn't blow the budget |
| Heartbeat retries | Time between retries | Bounds retry latency; failed jobs don't block the next session |
| Model routing | Cost per token | The agent uses cheap models for simple iterations, saving budget for complex ones |
These controls compose into a complete cost and reliability envelope. The iteration budget is the outermost boundary; the other controls operate within it.
Bottom Line
Unbounded reasoning feels powerful in a prototype and destructive in production. The agent that "tries a few more things" can turn into the agent that runs for hours, burns thousands of dollars, and delivers nothing. The user can't tell from the outside when the agent is making progress and when it's stuck. The cost accumulates silently.
Facio's iteration budget makes the invisible visible. The agent sees the budget remaining. The user sees the session length. The operator sees the cost. Everyone has the same information. The agent that uses its budget well produces results; the agent that doesn't, escalates via heartbeat. The system is bounded by design.
Because production agents aren't measured by how clever their reasoning is. They're measured by how reliably they deliver within constraints. The iteration budget is the constraint that makes the delivery reliable.
See the iteration budget documentation for configuration options, escalation patterns, and workload-specific tuning guides.