Facio's Context Window Discipline: How AI Agents Stay Sharp When Conversations Run for Hours
An AI agent's context window is its working memory. It's bounded — typically 32k, 64k, 128k, or 200k tokens depending on the model. Everything the agent knows right now lives in the window: the system prompt, the conversation history, the tool call results, the retrieved memories, the structured outputs. The window is what the model sees when it reasons.
The window is also finite. As a session runs, the context accumulates. Every tool call adds its result to the history. Every user message stays in the buffer. Every recalled memory fills space. By the time the session has been running for an hour, the agent's context might be 80% full. By two hours, it's overflowing and the model is dropping or truncating old content.
The result: the agent gets slower, more expensive, and less accurate. The model spends more compute per token as the context grows. Latency creeps up. Costs climb. And the agent's reasoning quality degrades because the relevant facts are buried in noise.
Facio's context window discipline gives agents the structural patterns to stay sharp across hours of work. The discipline isn't a single feature — it's a set of patterns the runtime and the agent apply together: selective loading, aggressive summarization, strategic forgetting, and checkpoint-driven compaction. Here's how each works and why the combination matters.
The Context Window Pressure Curve
The agent's experience of a long session follows a predictable curve:
Tokens: 0 → 100% of context window
Quality: high → medium → low → confused
Stage 1 (0-30%): Sharp. The agent has plenty of headroom. Reasoning is fast and accurate.
Stage 2 (30-60%): Functional. The agent is still working well, but starting to feel pressure on every retrieval.
Stage 3 (60-80%): Degraded. The agent's responses get longer as it cites more context. Latency increases.
Stage 4 (80-95%): Strained. The model is truncating old content to fit new. Key facts may be lost.
Stage 5 (95-100%): Failing. The model can't fit the request and the system prompt together. Errors spike.
The agent that doesn't manage its context hits Stage 4 by minute 60 of a complex session. The agent that does manage its context stays in Stage 1 or 2 throughout, because the discipline keeps the active context small even as the work accumulates.
The Discipline: Four Pillars
Facio's context window discipline rests on four pillars. Each addresses a specific aspect of context pressure.
Pillar 1: Selective Loading
The agent doesn't load everything into the context. It loads only what it needs for the current step:
# Wrong: Load everything into context
exec(command="cat large-documentation-set.md")
# Result: 50,000 tokens of documentation now in context
# Right: Load only what's needed
exec(command="grep -A 5 'authentication' large-documentation-set.md | head -50")
# Result: 500 tokens of relevant content in context
The selective loading discipline applies to every tool:
read_filewith offset/limit. Read only the section of the file the agent is investigating, not the whole file.execwithhead,tail,grep,awk. Extract the relevant slice of large outputs.grepinstead ofread_filefor content search. Find the lines that match; don't read the whole file.recallwith specific queries. Query memory for what the agent needs, not for "everything about X."
The discipline is: if a tool can return less, use that option. The context is too valuable to waste on data the agent won't immediately use.
Pillar 2: Aggressive Summarization
When the agent retrieves a large body of information, it summarizes before passing to the next step:
# Wrong: Pass the full search results to the next reasoning step
web_search(query="EU AI Act compliance for healthcare")
# 50,000 characters of search results now in context
# Right: Summarize before continuing
web_search(query="EU AI Act compliance for healthcare")
# Summarize the top 5 results into a 1,500-character summary
# Continue with the summary, not the raw results
The summarization happens automatically through several patterns:
Tool result truncation. The runtime truncates large tool results and provides a note: "Result truncated; full output available in /tmp/agent/tool-output-X.txt if needed."
Auto-summarization of retrieved content. When recall returns multiple items, the agent summarizes them into a single composite answer before adding to context.
Conversation compaction. Periodically, the runtime offers to compact the conversation history — replacing the last N turns with a structured summary that preserves key facts.
The discipline is: raw data doesn't stay in context. Processed information does.
Pillar 3: Strategic Forgetting
Not all context is created equal. The agent identifies context that can be safely forgotten and forgets it:
# Temporary diagnostic information
exec(command="kubectl describe pod api-server-7d4f...")
# 2,000 tokens of pod description
# Used to diagnose, then... is it still relevant 10 minutes later?
# Right: Forget the diagnostic after use
# The agent extracted the relevant fact ("image pull error")
# The full description is dropped from active context
# Available in tool output archive if needed
Strategic forgetting applies to:
- Diagnostic output. Once the agent has extracted the cause, the raw diagnostic goes.
- Intermediate reasoning. The reasoning chain that produced a decision is summarized; the intermediate steps go.
- Exploratory queries. Web searches and fetches that didn't lead anywhere are dropped after the conclusion is reached.
- Old conversation turns. Once the conversation has moved past a topic, the early turns can be compacted.
The discipline is: the agent's context is working memory, not archive memory. Items move from working to archive once their utility passes.
Pillar 4: Checkpoint-Driven Compaction
When the session reaches a natural pause point (or a configured threshold), the agent compacts the context by writing a checkpoint and starting a fresh context window that loads only what's needed to continue:
# At a checkpoint point:
# 1. Write a structured checkpoint capturing:
# - Mission
# - Progress (what's done, what's pending)
# - Current state
# - Key facts that need to persist
# - Decisions made
write_file(path="tmp/checkpoint.md", content=...)
# 2. Optionally: trigger a session break
# The next session loads:
# - System prompt (~2k tokens)
# - MEMORY.md (~5k tokens)
# - Checkpoint (~3k tokens)
# - Recent conversation (~2k tokens)
# = ~12k tokens, leaving 80%+ headroom for the new work
The checkpoint-driven compaction is the most aggressive form of context management. It resets the active context entirely, replacing it with a structured summary. The next session starts fresh but informed.
The Discipline in Practice: A Long Research Session
Consider a research session that runs for 90 minutes:
Minute 0-15: Setup and early research.
The agent starts with ~10k tokens of context (system prompt + memory). It does 5 web searches and 3 page fetches, adding maybe 8k tokens of raw results. It summarizes each result as it processes. Active context: ~14k tokens. Plenty of headroom.
Minute 15-45: Deep dive.
The agent reads 4 source documents in full (12k tokens), cross-references them with grep searches, and starts synthesizing findings. It uses grep -A 20 to extract relevant passages instead of reading whole documents. Active context: ~32k tokens. Still comfortable.
Minute 45-75: Analysis and writing.
The agent drafts the report section by section. Each section uses edit_file to write to a file rather than keeping the draft in context. The agent verifies its draft against sources using targeted grep queries. Active context: ~38k tokens. Getting tighter.
Minute 75: First compaction trigger.
The runtime offers a compaction. The agent accepts. The runtime summarizes the last 30 turns into a 2k-token summary. Active context: ~22k tokens. Headroom restored.
Minute 75-90: Final synthesis.
The agent finishes the report, does a final review, and writes the final version. Active context: ~28k tokens. Well within the window.
Total work: 90 minutes, ~250 tool calls, ~120k tokens of raw data processed, ~30k tokens of peak active context.
Without discipline, the same session would have hit 80k+ tokens by minute 60, degraded reasoning quality, and possibly failed to complete. With discipline, the agent finishes the work with quality intact.
The Compaction Triggers
Facio's runtime triggers compaction at three thresholds:
Soft trigger (50% of window). The runtime begins suggesting summarization for large tool outputs. The agent can accept or decline.
Hard trigger (75% of window). The runtime requires the agent to compact before processing the next tool call. The agent must summarize or checkpoint to continue.
Emergency trigger (90% of window). The runtime forces a compaction. The agent's pending work is checkpointed; the context is reset to the checkpoint plus system essentials.
The triggers are calibrated to keep the agent in productive stages of the pressure curve. The agent doesn't reach Stage 4 (Strained) because the runtime intervenes before that point.
The Discipline Doesn't Help Every Case
Honest limitations:
- It doesn't change the model's fundamental attention limits. Even with discipline, very long conversations lose coherence on subtle details. The model still has trouble tracking every fact in a long context; the discipline just reduces the volume.
- It can lose information during compaction. A summary isn't a perfect replacement for the original. The agent may forget a nuance that mattered. The discipline preserves the most important facts but can't preserve everything.
- It requires the agent to be a good summarizer. Bad summarization loses critical details. The discipline works because Facio's agents are trained and prompted to summarize well. With a less capable model, the discipline might introduce errors.
- It adds overhead. Summarizing, compacting, and writing checkpoints consume tool calls and tokens. The discipline trades some efficiency for sustained quality.
- It doesn't help with cross-session continuity perfectly. A session that resumes from a checkpoint isn't identical to the original session. The agent has the structured summary, not the full reasoning flow. Some continuity is lost.
The Performance Impact
The discipline has measurable performance impact:
Latency. Per-iteration latency stays flat instead of climbing with context length. The agent's response time at minute 90 is similar to minute 5.
Cost. Per-iteration cost stays flat too. The agent isn't paying for processing huge contexts every step.
Quality. Reasoning quality stays high. The agent isn't confused by the sheer volume of context. The model can focus on what matters.
Reliability. The agent completes long tasks. Without discipline, complex multi-hour workflows often fail to finish. With discipline, the completion rate improves dramatically.
The performance impact compounds. Faster, cheaper, more accurate, more reliable — the discipline delivers on all four metrics.
The Agent's Context as a Discipline
Context window discipline is more than a set of techniques. It's a way of thinking about the agent's role. The agent doesn't treat context as a place to dump information. It treats context as a scarce resource to be managed carefully.
The analogy is a chef's mise en place. A chef doesn't dump every ingredient on the counter at once. The chef has only the ingredients needed for the current step in front of them, with the rest organized nearby but not cluttering the workspace. The chef reaches for an ingredient when needed, uses it, and puts it away.
A Facio agent applies the same discipline. The agent has only the information needed for the current reasoning in its context, with the rest available in the workspace but not cluttering the window. The agent reaches for information when needed, uses it, and summarizes or forgets it.
The chef's kitchen runs smoothly because of mise en place. The agent's context runs smoothly because of discipline. Both are about managing a constrained working space effectively.
Bottom Line
Long AI agent sessions accumulate context. The accumulation is inevitable. The discipline is how the agent handles it.
Facio's context window discipline has four pillars: selective loading (don't retrieve what you won't use), aggressive summarization (process raw data into distilled information), strategic forgetting (drop what no longer serves), and checkpoint-driven compaction (reset the window when needed). The combination keeps the agent sharp across hours of work.
The agent without discipline gets slower, more expensive, and more confused as the session progresses. The agent with discipline stays sharp throughout. The work gets finished. The quality holds. The cost stays bounded.
Because long sessions are where AI agents earn their keep. Short sessions are toys. Long sessions are tools. The discipline is what makes long sessions work.
See the context window discipline documentation for compaction triggers, summarization patterns, and checkpoint-driven context reset configurations.