Back to blog

Product · Jun 28, 2026

Facio's Context Window Discipline: How AI Agents Stay Sharp When Conversations Run for Hours

An AI agent's context window is its working memory — and it's finite. Long sessions accumulate context: every tool call result, every message, every retrieved fact stays in the window until the session ends. The agent gets slower, more expensive, and eventually confused as the context fills. Facio's context window discipline gives agents the structural patterns to stay sharp across hours of work: selective loading, aggressive summarization, strategic forgetting, and checkpoint-driven compaction. Here's how the discipline works.

Context WindowLong SessionsToken EfficiencyCompactionAgent Performance

Facio's Context Window Discipline: How AI Agents Stay Sharp When Conversations Run for Hours

An AI agent's context window is its working memory. It's bounded — typically 32k, 64k, 128k, or 200k tokens depending on the model. Everything the agent knows right now lives in the window: the system prompt, the conversation history, the tool call results, the retrieved memories, the structured outputs. The window is what the model sees when it reasons.

The window is also finite. As a session runs, the context accumulates. Every tool call adds its result to the history. Every user message stays in the buffer. Every recalled memory fills space. By the time the session has been running for an hour, the agent's context might be 80% full. By two hours, it's overflowing and the model is dropping or truncating old content.

The result: the agent gets slower, more expensive, and less accurate. The model spends more compute per token as the context grows. Latency creeps up. Costs climb. And the agent's reasoning quality degrades because the relevant facts are buried in noise.

Facio's context window discipline gives agents the structural patterns to stay sharp across hours of work. The discipline isn't a single feature — it's a set of patterns the runtime and the agent apply together: selective loading, aggressive summarization, strategic forgetting, and checkpoint-driven compaction. Here's how each works and why the combination matters.

The Context Window Pressure Curve

The agent's experience of a long session follows a predictable curve:

Tokens:    0 → 100% of context window
Quality:   high → medium → low → confused

Stage 1 (0-30%): Sharp. The agent has plenty of headroom. Reasoning is fast and accurate.
Stage 2 (30-60%): Functional. The agent is still working well, but starting to feel pressure on every retrieval.
Stage 3 (60-80%): Degraded. The agent's responses get longer as it cites more context. Latency increases.
Stage 4 (80-95%): Strained. The model is truncating old content to fit new. Key facts may be lost.
Stage 5 (95-100%): Failing. The model can't fit the request and the system prompt together. Errors spike.

The agent that doesn't manage its context hits Stage 4 by minute 60 of a complex session. The agent that does manage its context stays in Stage 1 or 2 throughout, because the discipline keeps the active context small even as the work accumulates.

The Discipline: Four Pillars

Facio's context window discipline rests on four pillars. Each addresses a specific aspect of context pressure.

Pillar 1: Selective Loading

The agent doesn't load everything into the context. It loads only what it needs for the current step:

# Wrong: Load everything into context
exec(command="cat large-documentation-set.md")
# Result: 50,000 tokens of documentation now in context

# Right: Load only what's needed
exec(command="grep -A 5 'authentication' large-documentation-set.md | head -50")
# Result: 500 tokens of relevant content in context

The selective loading discipline applies to every tool:

  • read_file with offset/limit. Read only the section of the file the agent is investigating, not the whole file.
  • exec with head, tail, grep, awk. Extract the relevant slice of large outputs.
  • grep instead of read_file for content search. Find the lines that match; don't read the whole file.
  • recall with specific queries. Query memory for what the agent needs, not for "everything about X."

The discipline is: if a tool can return less, use that option. The context is too valuable to waste on data the agent won't immediately use.

Pillar 2: Aggressive Summarization

When the agent retrieves a large body of information, it summarizes before passing to the next step:

# Wrong: Pass the full search results to the next reasoning step
web_search(query="EU AI Act compliance for healthcare")
# 50,000 characters of search results now in context

# Right: Summarize before continuing
web_search(query="EU AI Act compliance for healthcare")
# Summarize the top 5 results into a 1,500-character summary
# Continue with the summary, not the raw results

The summarization happens automatically through several patterns:

Tool result truncation. The runtime truncates large tool results and provides a note: "Result truncated; full output available in /tmp/agent/tool-output-X.txt if needed."

Auto-summarization of retrieved content. When recall returns multiple items, the agent summarizes them into a single composite answer before adding to context.

Conversation compaction. Periodically, the runtime offers to compact the conversation history — replacing the last N turns with a structured summary that preserves key facts.

The discipline is: raw data doesn't stay in context. Processed information does.

Pillar 3: Strategic Forgetting

Not all context is created equal. The agent identifies context that can be safely forgotten and forgets it:

# Temporary diagnostic information
exec(command="kubectl describe pod api-server-7d4f...")
# 2,000 tokens of pod description
# Used to diagnose, then... is it still relevant 10 minutes later?

# Right: Forget the diagnostic after use
# The agent extracted the relevant fact ("image pull error")
# The full description is dropped from active context
# Available in tool output archive if needed

Strategic forgetting applies to:

  • Diagnostic output. Once the agent has extracted the cause, the raw diagnostic goes.
  • Intermediate reasoning. The reasoning chain that produced a decision is summarized; the intermediate steps go.
  • Exploratory queries. Web searches and fetches that didn't lead anywhere are dropped after the conclusion is reached.
  • Old conversation turns. Once the conversation has moved past a topic, the early turns can be compacted.

The discipline is: the agent's context is working memory, not archive memory. Items move from working to archive once their utility passes.

Pillar 4: Checkpoint-Driven Compaction

When the session reaches a natural pause point (or a configured threshold), the agent compacts the context by writing a checkpoint and starting a fresh context window that loads only what's needed to continue:

# At a checkpoint point:
# 1. Write a structured checkpoint capturing:
#    - Mission
#    - Progress (what's done, what's pending)
#    - Current state
#    - Key facts that need to persist
#    - Decisions made
write_file(path="tmp/checkpoint.md", content=...)

# 2. Optionally: trigger a session break
# The next session loads:
#    - System prompt (~2k tokens)
#    - MEMORY.md (~5k tokens)
#    - Checkpoint (~3k tokens)
#    - Recent conversation (~2k tokens)
#    = ~12k tokens, leaving 80%+ headroom for the new work

The checkpoint-driven compaction is the most aggressive form of context management. It resets the active context entirely, replacing it with a structured summary. The next session starts fresh but informed.

The Discipline in Practice: A Long Research Session

Consider a research session that runs for 90 minutes:

Minute 0-15: Setup and early research.

The agent starts with ~10k tokens of context (system prompt + memory). It does 5 web searches and 3 page fetches, adding maybe 8k tokens of raw results. It summarizes each result as it processes. Active context: ~14k tokens. Plenty of headroom.

Minute 15-45: Deep dive.

The agent reads 4 source documents in full (12k tokens), cross-references them with grep searches, and starts synthesizing findings. It uses grep -A 20 to extract relevant passages instead of reading whole documents. Active context: ~32k tokens. Still comfortable.

Minute 45-75: Analysis and writing.

The agent drafts the report section by section. Each section uses edit_file to write to a file rather than keeping the draft in context. The agent verifies its draft against sources using targeted grep queries. Active context: ~38k tokens. Getting tighter.

Minute 75: First compaction trigger.

The runtime offers a compaction. The agent accepts. The runtime summarizes the last 30 turns into a 2k-token summary. Active context: ~22k tokens. Headroom restored.

Minute 75-90: Final synthesis.

The agent finishes the report, does a final review, and writes the final version. Active context: ~28k tokens. Well within the window.

Total work: 90 minutes, ~250 tool calls, ~120k tokens of raw data processed, ~30k tokens of peak active context.

Without discipline, the same session would have hit 80k+ tokens by minute 60, degraded reasoning quality, and possibly failed to complete. With discipline, the agent finishes the work with quality intact.

The Compaction Triggers

Facio's runtime triggers compaction at three thresholds:

Soft trigger (50% of window). The runtime begins suggesting summarization for large tool outputs. The agent can accept or decline.

Hard trigger (75% of window). The runtime requires the agent to compact before processing the next tool call. The agent must summarize or checkpoint to continue.

Emergency trigger (90% of window). The runtime forces a compaction. The agent's pending work is checkpointed; the context is reset to the checkpoint plus system essentials.

The triggers are calibrated to keep the agent in productive stages of the pressure curve. The agent doesn't reach Stage 4 (Strained) because the runtime intervenes before that point.

The Discipline Doesn't Help Every Case

Honest limitations:

  • It doesn't change the model's fundamental attention limits. Even with discipline, very long conversations lose coherence on subtle details. The model still has trouble tracking every fact in a long context; the discipline just reduces the volume.
  • It can lose information during compaction. A summary isn't a perfect replacement for the original. The agent may forget a nuance that mattered. The discipline preserves the most important facts but can't preserve everything.
  • It requires the agent to be a good summarizer. Bad summarization loses critical details. The discipline works because Facio's agents are trained and prompted to summarize well. With a less capable model, the discipline might introduce errors.
  • It adds overhead. Summarizing, compacting, and writing checkpoints consume tool calls and tokens. The discipline trades some efficiency for sustained quality.
  • It doesn't help with cross-session continuity perfectly. A session that resumes from a checkpoint isn't identical to the original session. The agent has the structured summary, not the full reasoning flow. Some continuity is lost.

The Performance Impact

The discipline has measurable performance impact:

Latency. Per-iteration latency stays flat instead of climbing with context length. The agent's response time at minute 90 is similar to minute 5.

Cost. Per-iteration cost stays flat too. The agent isn't paying for processing huge contexts every step.

Quality. Reasoning quality stays high. The agent isn't confused by the sheer volume of context. The model can focus on what matters.

Reliability. The agent completes long tasks. Without discipline, complex multi-hour workflows often fail to finish. With discipline, the completion rate improves dramatically.

The performance impact compounds. Faster, cheaper, more accurate, more reliable — the discipline delivers on all four metrics.

The Agent's Context as a Discipline

Context window discipline is more than a set of techniques. It's a way of thinking about the agent's role. The agent doesn't treat context as a place to dump information. It treats context as a scarce resource to be managed carefully.

The analogy is a chef's mise en place. A chef doesn't dump every ingredient on the counter at once. The chef has only the ingredients needed for the current step in front of them, with the rest organized nearby but not cluttering the workspace. The chef reaches for an ingredient when needed, uses it, and puts it away.

A Facio agent applies the same discipline. The agent has only the information needed for the current reasoning in its context, with the rest available in the workspace but not cluttering the window. The agent reaches for information when needed, uses it, and summarizes or forgets it.

The chef's kitchen runs smoothly because of mise en place. The agent's context runs smoothly because of discipline. Both are about managing a constrained working space effectively.

Bottom Line

Long AI agent sessions accumulate context. The accumulation is inevitable. The discipline is how the agent handles it.

Facio's context window discipline has four pillars: selective loading (don't retrieve what you won't use), aggressive summarization (process raw data into distilled information), strategic forgetting (drop what no longer serves), and checkpoint-driven compaction (reset the window when needed). The combination keeps the agent sharp across hours of work.

The agent without discipline gets slower, more expensive, and more confused as the session progresses. The agent with discipline stays sharp throughout. The work gets finished. The quality holds. The cost stays bounded.

Because long sessions are where AI agents earn their keep. Short sessions are toys. Long sessions are tools. The discipline is what makes long sessions work.


See the context window discipline documentation for compaction triggers, summarization patterns, and checkpoint-driven context reset configurations.

Keep reading

More on Product

View category
Jun 27, 2026Product

Facio's Operational Metrics: The KPIs That Tell You Whether Your AI Agent Is Actually Working

Most teams running AI agents have no idea whether the agent is working. They count sessions and tokens, mistake volume for value, and have no metrics for agent quality, user satisfaction, or business outcome. Facio's operational metrics give you the structured KPIs that distinguish an agent that genuinely works from one that just runs. Here's the framework, the metrics that matter, and how to use them to improve agent performance over time.

Jun 26, 2026Product

Why Facio Agents Get Smarter Every Session: The Compounding Returns of Institutional Memory

A stateless AI agent starts every conversation from zero — no knowledge of the user, no awareness of past work, no institutional context. The user re-explains, the agent re-investigates, the workflow repeats from scratch. A Facio agent with institutional memory starts every session with accumulated knowledge: the user's preferences, the project's history, the lessons learned, the patterns that work. The result is compounding returns — each session is faster, more accurate, and more personalized than the last. Here's how the compounding happens.

Jun 25, 2026Product

Facio's Decision Provenance: How to Explain an AI Agent's Reasoning After the Fact

"Why did the agent do that?" is the question every AI operator eventually has to answer. The answer is rarely obvious from looking at the agent's outputs. Production agents take hundreds of tool calls across complex contexts; reconstructing the reasoning requires structured provenance — what the agent saw, what it knew, what it decided, and why. Facio's decision provenance features turn post-hoc explanation from guesswork into query work. Here's how to make AI agent reasoning auditable after the fact.