Facio's Reflection Process: How Periodic Memory Curation Keeps AI Agents Honest at Scale
A memory file that grows without curation becomes noise. Facts accumulate from every conversation. Corrections get added next to the original claims. Old project context lingers after the project ends. The agent's passive context fills with information that no longer matters, and the signal-to-noise ratio drops with every passing week.
The first month, MEMORY.md is 200 lines of useful, curated knowledge. By month six, it's 1,500 lines of accumulated context, half of it stale, a quarter contradictory, the rest duplicated. The agent now loads 1,500 lines on every session — and most of it is wrong, outdated, or both.
Facio's Reflection process is the periodic curator that prevents this decay. It runs on a schedule, reviews the agent's memory, identifies what to keep, what to update, what to consolidate, and what to remove. The result: MEMORY.md stays small, accurate, and useful — no matter how long the agent has been running. Here's how the curator works and why automated memory hygiene is a non-negotiable for production agents.
The Problem: Memory Decay
Memory decay is the natural state of any system that accumulates information without curation. For AI agents, the decay has a specific shape:
Staleness. Facts that were true when written become false over time. A user's role changes, a project's status evolves, a tool's name gets updated. The memory file still contains the old information. The agent references it. The answer is wrong.
Contradiction. The user says "we use Postgres" in January. Says "we migrated to MySQL" in June. Both facts land in memory. The agent, unsure which is current, picks one (often the older one). The response is inconsistent with reality.
Duplication. The same fact gets recorded multiple times in slightly different ways. "Customer database: PostgreSQL" / "We use Postgres" / "DB: PG." Three lines saying one thing. The token cost triples.
Drift. Original entries get modified over time. Inline learning adds updates, context shifts, scope expands. The entry no longer says what it used to. By the time someone reads it, the meaning is unclear.
Accumulation. The agent learns new things. They get added. Nothing ever gets removed. The file grows linearly with the agent's lifetime. At year two, the file is so large that loading it eats most of the context window.
These problems compound. A contradictory entry that isn't resolved becomes a recurring source of bad outputs. A duplicated entry burns tokens on every session. A stale entry causes errors that take investigation time to diagnose. Memory decay is a slow, silent tax on every conversation the agent has.
The Reflection Trigger
Facio's Reflection process runs on a configurable schedule. The default is periodic — after a configurable number of sessions, or after a configurable time period, or both. Some teams trigger it nightly via cron. Others trigger it after every N conversations. The schedule is configurable per agent.
Reflection is also triggered by specific events:
- Memory file size threshold. When MEMORY.md exceeds a configured size (default 30KB), Reflection runs. Large memory files are a sign of accumulation.
- Inline learning rate. When the agent has been making many inline updates, Reflection runs to verify the additions are consistent with existing entries.
- Manual trigger. The user can request a reflection cycle via
ask_approvalor a direct command.
The trigger is the signal that something needs curation. The user doesn't have to remember to run it. The agent doesn't have to remember either. The schedule handles it.
What Reflection Does
When Reflection runs, it performs a structured review of the agent's memory. The review has five phases, each with a specific purpose:
Phase 1: Inventory
Reflection reads the entire memory file and creates an inventory of every fact, preference, project note, and lesson. Each item gets categorized:
- User preferences — communication style, language, format choices
- Project facts — project names, contacts, status, deadlines
- Tool knowledge — how to use specific tools, gotchas, patterns
- Lessons learned — corrections received, failures to avoid
- Operational notes — workspace conventions, deployment procedures
- Stale candidates — entries that reference dates, projects, or roles from the past
The inventory is the working list. Reflection now knows what it's curating.
Phase 2: Staleness Detection
Reflection reviews each entry for signs of staleness:
- Time-bound references. Entries that mention "as of 2025" or "the migration we did last quarter" — if 6+ months have passed, the entry is stale.
- Project status. Entries that say "X is in progress" or "Y is the current approach" — if the project ended or the approach changed, the entry is stale.
- Role references. Entries that say "user is the lead developer" or "team uses Slack" — if the role or tool changed, the entry is stale.
- Numerical facts. Entries that say "we have 50 customers" — if the number changed, the entry is stale.
Staleness candidates are flagged for verification. Reflection can use recall to check recent conversations for updates, or web_search to verify external facts. The goal isn't to remove everything old — it's to remove what's no longer accurate.
Phase 3: Contradiction Resolution
Reflection identifies entries that contradict each other. Common contradiction patterns:
- Sequential updates. "We use Postgres" (Jan), "We use MySQL" (Jun). The newer entry supersedes; the older is removed or marked as historical.
- Inconsistent scope. "The API uses REST" vs. "The API uses GraphQL." These can't both be current. Reflection checks the history to find the most recent and consistent claim.
- Inverted facts. "The staging URL is staging.example.com" vs. "We moved staging to staging-v2.example.com." The newer is correct; the older is removed.
For each contradiction, Reflection either:
- Updates the original entry to reflect the newer fact
- Removes the older entry entirely
- Marks one as historical with a date reference, keeping both for context
The decision is logged so the user can see what was resolved and why.
Phase 4: Consolidation
Reflection identifies duplicate or near-duplicate entries and consolidates them:
- Exact duplicates. Two entries saying the same thing in different words. Keep one, remove the others.
- Pattern consolidation. Three entries describing related lessons. Combine them into a single, more comprehensive entry.
- Section organization. Entries that belong together but are scattered. Move them into the right section of MEMORY.md.
Consolidation reduces the memory's token footprint without losing information. A 1,500-line file with 200 facts becomes a 600-line file with 200 facts. Same knowledge, less overhead.
Phase 5: Pruning
Reflection removes entries that no longer serve any purpose:
- Outdated project context. "Project X (completed March 2025) used approach Y." If the project is done and the approach isn't reusable, the entry is removed.
- Superseded lessons. A lesson learned that was later superseded by a better practice. The newer practice stays; the older lesson is removed.
- One-time facts. Information that was useful once but isn't generally relevant. The agent rarely needs to know "we ordered pizza for the launch party." Removed.
- Personal notes. Entries that are too personal or specific to be useful. "User prefers dark mode in their IDE." If the agent never references it, it's noise.
Pruning is the most aggressive phase. The goal is to keep only what's actively useful for the agent's operation.
What Reflection Doesn't Do
The curator has deliberate limits:
- It doesn't delete history. The full conversation log in
memory/history.jsonlis preserved. Reflection cleans up the curated memory, not the historical record. If a fact is removed from MEMORY.md, it can still be found viarecallif needed. - It doesn't add new information. Reflection curates what exists. Adding new facts is the agent's job, via inline learning during conversations.
- It doesn't make value judgments. Reflection removes stale and contradictory entries, but it doesn't decide whether a preference is "important" or a fact is "relevant." That's the agent's and the user's judgment.
- It doesn't break working patterns. If the agent is functioning well with the current memory, Reflection preserves the working entries. It only prunes what's demonstrably stale or contradictory.
- It doesn't surprise the user. Reflection logs every change. The user can review what was added, removed, and updated. Manual override is always possible.
The Reflection Audit Trail
Every Reflection cycle produces a structured log of what changed:
# Reflection Log: 2026-06-18 10:00 UTC
## Stale Entries Removed (3)
- "Project Phoenix status: in progress" (last updated 2025-12-01)
- "User role: Marketing Lead" (changed 2026-02-15)
- "API endpoint: api-v1.example.com" (deprecated 2026-04-22)
## Contradictions Resolved (1)
- "Database: PostgreSQL" vs. "Database: MySQL"
→ Resolved: MySQL is current (per 2026-05-08 conversation)
→ Updated first entry to "Database: MySQL (migrated from PostgreSQL May 2026)"
## Duplicates Consolidated (2 sets)
- Three entries about Slack workspace → consolidated to one
- Two entries about deployment procedure → merged into comprehensive entry
## New Entries Added (0)
- No new facts identified; current MEMORY.md is comprehensive
## Summary
- Started: 1,247 lines, 31KB
- Ended: 891 lines, 22KB
- Reduction: 28% smaller, same effective knowledge
The audit trail makes Reflection's work visible and reviewable. The user can spot-check what was changed. If Reflection removed something important, the user can restore it. If Reflection kept something stale, the user can flag it for next cycle.
When to Run Reflection
The default schedule depends on agent activity:
- High-activity agents (multiple sessions per day): Daily or every 10 sessions. Memory accumulates fast; curation needs to keep up.
- Medium-activity agents (a few sessions per week): Weekly. Enough to catch staleness before it affects outputs.
- Low-activity agents (occasional use): Monthly or on-demand. Less accumulation, so less frequent curation is fine.
- After major changes: A project completes, a user's role changes, a tool is deprecated. These events trigger an out-of-cycle Reflection.
The schedule is configurable. The user can run Reflection manually via a command, or the system can trigger it automatically based on size thresholds or session counts.
What Production Looks Like With Reflection
The difference between an agent with Reflection and one without becomes stark over time:
Without Reflection (Month 12):
- MEMORY.md: 4,200 lines, 95KB
- Per-session context tax: 24,000 tokens (60% of a 40K context window)
- Stale entries: 200+ (outdated project status, old role references, deprecated tool names)
- Contradictions: 30+ (entries that say opposite things)
- Agent reliability: declining (more wrong answers, more "I thought we used X")
- User trust: eroding
With Reflection (Month 12):
- MEMORY.md: 600 lines, 15KB
- Per-session context tax: 4,000 tokens (10% of context window)
- Stale entries: 0 (curated weekly)
- Contradictions: 0 (resolved automatically)
- Agent reliability: high (curated, accurate context)
- User trust: high (the agent is consistently right)
The first scenario describes an agent that's been running for a year. The second describes the same agent with weekly Reflection cycles. The structural difference is the curator.
The Compounding Effect
Reflection creates a compounding benefit that gets stronger over time:
- The agent that runs Reflection is more reliable. Users trust it more. They use it more. The relationship deepens.
- The deeper relationship produces more inline learning. More facts get recorded. The memory grows.
- The growing memory is curated. Stale entries are removed. Contradictions are resolved. The signal stays high.
- The high signal produces better agent outputs. Users get more value. They use the agent for more tasks.
- More tasks produce more memory. The cycle compounds.
The agent without Reflection decays. The agent with Reflection improves. The gap widens with every passing month.
Bottom Line
Memory that grows without curation decays. Facts go stale. Contradictions pile up. Duplicates accumulate. The agent's context fills with noise. The cost compounds with every session.
Facio's Reflection process is the structural fix. Periodic curation, structured review phases, audit trails, configurable schedules. The curator runs whether the user remembers to trigger it or not. The curator logs every change. The curator gets smarter about the user's patterns over time.
An agent with Reflection isn't a smarter agent. It's an agent with cleaner inputs. The reasoning capability is the same. The context the reasoning operates over is dramatically better. The result: outputs that reflect current reality, not accumulated cruft.
Because an AI agent's memory is its institutional knowledge. Institutions that don't maintain their knowledge decay. Institutions that do, endure.
See the Reflection documentation for configuration options, schedule tuning, and audit trail format specifications.