Back to blog

Product · Jun 16, 2026

Facio's Two-Layer Memory: How Passive Context and Active Recall Give AI Agents Institutional Knowledge

AI agents with no memory are amnesiacs — they start every session knowing nothing about the user, the project, or the prior work. Facio's two-layer memory architecture combines passive context (always loaded) with active recall (query-based search) to give agents institutional knowledge that grows over time. Here's how the layers work, what each is good for, and why the combination beats either approach alone.

Memory ArchitectureLong-Term RecallAgent ContextKnowledge ManagementInstitutional Memory

Facio's Two-Layer Memory: How Passive Context and Active Recall Give AI Agents Institutional Knowledge

An AI agent with no memory is an amnesiac. It starts every session knowing nothing about the user, the project, the prior decisions, or the lessons learned. The agent repeats the same introductions, asks the same questions, makes the same mistakes, and produces output that ignores everything that came before. It's not a relationship with the user. It's a stranger in a chat window.

Production AI agents need memory. Not just the model's context window (which is bounded and ephemeral) but persistent memory that survives across sessions, accumulates knowledge, and gets smarter over time. The question isn't whether to have memory — it's how to architect it.

Facio uses a two-layer memory architecture that combines always-loaded passive context with query-based active recall. The two layers serve different purposes, work at different speeds, and compose into a memory system that gives agents genuine institutional knowledge. Here's how the layers work, what each is good for, and why the combination beats either approach alone.

Layer 1: Passive Context (MEMORY.md)

The first memory layer is the passive context — the markdown files in the workspace root that are automatically included in every session's context window:

  • MEMORY.md — long-term facts about the user, the project, the user's preferences, and the user's business. Managed by Reflection (Facio's curator) and inline learning.
  • SOUL.md — the agent's persona and communication style. Defines how the agent speaks, what tone it uses, what emoji it avoids.
  • USER.md — the user's profile: name, language, communication style, professional context. Used to personalize every interaction.
  • AGENTS.md — operational instructions for the agent. AGENTS.md is loaded with workspace conventions.
  • WORKSPACE.md — the directory layout convention. Loaded as part of bootstrap context.

These files are loaded into the context window at the start of every session. The agent reads them as part of its initial context, alongside the system prompt. Every session begins with the same institutional knowledge.

What passive context is good for:

  • High-frequency, low-cost information. The user's name, language, communication style — the agent uses these in every response. They need to be in every context.
  • User preferences. "Use German in responses." "Use minimal emoji." "Refer to the user by first name." These shape every interaction and should always be loaded.
  • Operational conventions. WORKSPACE.md, AGENTS.md — these are the rules the agent follows. They need to be in context to be followed.
  • Recent context. MEMORY.md includes a "Last conversation" or "Recent activity" section that summarizes what was discussed recently. The agent picks up where the previous session left off.

What passive context is NOT good for:

  • Large fact bases. Loading 10,000 lines of MEMORY.md into every session burns tokens for facts that may not be relevant.
  • Deep historical data. Conversation logs from 6 months ago rarely matter for today's work.
  • Specialized knowledge. Domain-specific reference material that the agent only needs for certain tasks.
  • Things that change frequently. If the data is updated hourly, loading it from MEMORY.md means loading stale data.

Passive context is the agent's "always-on" knowledge — the things it needs to know in every session. It's bounded, curated, and managed.

Layer 2: Active Recall

The second memory layer is active recall — the agent's ability to query a search index of past conversations, memory entries, and workspace content. The recall tool is the interface:

recall(query="authentication flow decision from last month", limit=5)

The tool searches the memory index using hybrid search (BM25 keywords + semantic similarity), returns ranked results with timestamps and relevance scores, and lets the agent access content that's not in the current context window.

The memory index includes:

  • memory/history.jsonl — append-only JSONL log of all past conversations. Never loaded into context, always searchable.
  • memory/search.db — SQLite database with embeddings (when available) and keyword indexes for fast retrieval.
  • Workspace content — markdown files, code, configuration. Indexable but only when explicitly added.

What active recall is good for:

  • Historical context. "What did we decide about the authentication flow last month?" The agent queries the history, finds the decision, and uses it.
  • Specific facts. "What's the user's tax ID for the German filing?" The agent queries memory, finds the fact, returns it.
  • Past corrections. "What did the user correct me on previously?" The agent queries for corrections, applies the lessons.
  • Cross-session knowledge. Anything that's been written to MEMORY.md, the history log, or indexed workspace content.

What active recall is NOT good for:

  • Real-time context. Recall is a query, not a stream. The agent has to know to ask.
  • Always-needed facts. The agent shouldn't have to recall the user's name on every turn — that belongs in passive context.
  • Time-sensitive data. Recall returns the most recent relevant match, not necessarily the most current data. For real-time information, the agent uses web_search or scheduled tools.

Active recall is the agent's "library" — vast, searchable, but requires the agent to know what to look for.

How the Layers Compose

The two layers aren't redundant; they're complementary. The agent uses them in a deliberate sequence:

Step 1: Read passive context. At the start of every session, the agent has MEMORY.md, USER.md, SOUL.md, AGENTS.md, and WORKSPACE.md in context. The agent knows the user, the project conventions, and recent activity.

Step 2: Recognize gaps. During the conversation, the agent encounters a question that passive context can't answer. "What was that regex pattern from the March deployment?" — the agent knows this isn't in the current context.

Step 3: Query active recall. The agent calls recall(query="regex pattern March deployment", limit=5), gets ranked results, and uses the most relevant one.

Step 4: Synthesize and respond. The agent combines the passive context (user preferences, current project state) with the recalled fact (the regex pattern) and produces an answer that leverages both.

This composition is what makes the architecture powerful. Passive context provides the foundation — always loaded, always available, always personalized. Active recall provides the depth — queryable, comprehensive, historical. The agent that uses both is more useful than an agent with either alone.

What Goes in Each Layer

The decision of "passive or active" is one of the most important memory design decisions. Here's the rule:

Passive context: Information the agent needs in MOST sessions, regardless of task.

  • User identity (name, language, role)
  • Communication preferences (style, format, emoji policy)
  • Operational conventions (workspace layout, file naming)
  • Project fundamentals (project name, key contacts, current state)
  • Recent activity summary (last 1-3 sessions)

Active recall: Information the agent needs SOMETIMES, depending on task.

  • Specific technical details (regex patterns, API endpoints, configuration values)
  • Historical decisions (why we chose X over Y)
  • Past corrections and lessons learned
  • Domain-specific knowledge (regulations, standards, terminology)
  • Older conversation history

The distinction matters because passive context costs tokens on every session. A 1,000-token MEMORY.md is a 1,000-token tax on every conversation. The information in passive context has to earn its place by being useful across many sessions, not just one or two.

The Memory Lifecycle

Facio's memory system has a lifecycle that's managed by Reflection (the curator process) and inline learning (the agent's runtime updates):

1. CONVERSATION: Agent and user interact. Decisions are made, facts are shared, preferences are revealed.

2. INLINE LEARNING: Agent identifies durable facts (user preferences, project decisions, corrections) and uses edit_file to add them to MEMORY.md. Updates are surgical (one line at a time) and logged.

3. CONVERSATION LOG: Every conversation is appended to memory/history.jsonl. The full record is preserved.

4. INDEXING: The memory search index is updated to include new conversation content. New queries can find new context.

5. REFLECTION (periodic): Reflection reviews MEMORY.md for staleness, contradiction, and noise. Outdated entries are removed. Contradictions are resolved. Patterns are consolidated.

6. PASSIVE CONTEXT UPDATE: MEMORY.md is the source for passive context. Reflection ensures MEMORY.md is the curated, up-to-date version of the agent's durable knowledge.

The agent participates in step 2 (inline learning during the conversation). Reflection handles step 5 (periodic curation). The runtime handles steps 3, 4, and 6 automatically.

Why the Two-Layer Design Wins

Single-layer memory systems have a recognizable failure mode: they either over-load the context (everything is "passive") or under-serve the agent (everything requires a query).

The "everything passive" anti-pattern. A MEMORY.md file that grows unbounded. After a year, it has 50,000 tokens of accumulated knowledge. The agent burns 25% of its context window loading the memory file alone. Worse, the agent has to wade through old, possibly-stale, possibly-irrelevant facts to find the current ones. Performance degrades.

The "everything active" anti-pattern. No MEMORY.md. Every piece of information is queried via recall. The agent has to know what to ask for, has to formulate queries, has to wait for results. Simple things like "what's the user's name" require a recall call. The user experience degrades.

The two-layer design avoids both failure modes. Passive context has a budget (Facio's runtime enforces reasonable limits on MEMORY.md size). Active recall has the full history. The agent uses passive context as the foundation and recall for the long tail.

How Memory Scales Over Time

The architecture is designed to scale with the agent's lifetime:

  • Day 1. MEMORY.md has 200 lines. Active recall has 1 conversation in the index. The agent knows the basics.
  • Month 1. MEMORY.md has 300 lines (some growth from inline learning, some pruning from Reflection). Active recall has 30+ conversations. The agent has a rich history to draw on.
  • Year 1. MEMORY.md has 500 lines (curated to the essentials). Active recall has 365+ conversations, indexed and searchable. The agent has institutional knowledge that rivals a long-time team member.

The growth pattern is what matters. MEMORY.md grows slowly, deliberately, with curation. The history grows linearly with conversation count. The agent's effective knowledge grows superlinearly — the more it knows, the better it can apply that knowledge to new situations.

What the Two-Layer Memory Doesn't Do

A few honest limitations:

  • The memory is per-agent, not shared. Each Facio agent has its own memory. If multiple agents work on the same project, they need to coordinate via shared files (MEMORY.md in a shared workspace) or accept that they have separate institutional knowledge.
  • Recall is best-effort, not guaranteed. The search index is a hint, not a query language. Some facts are hard to find if the agent doesn't know the right keywords. The agent learns to write good queries through experience.
  • Memory has no built-in contradiction detection. If the user says "X is true" in March and "X is false" in June, the agent has to notice the contradiction. Inline learning adds the new fact; the old fact remains in history. Reflection consolidates over time, but the agent should still be alert.
  • The architecture doesn't replace a knowledge base. For deep, structured domain knowledge (a regulatory framework, a code library's API), the agent uses a dedicated knowledge base — typically a vector store or a RAG system. Memory is for conversational and operational context.

Bottom Line

An AI agent with no memory is an amnesiac. An AI agent with only passive context is over-loaded. An AI agent with only active recall is slow. The architecture that works is the one that uses both — passive context for the foundation, active recall for the depth.

Facio's two-layer memory gives agents institutional knowledge that grows over time. Every session begins with the curated essentials in context. Every question about history, decisions, or specialized facts can be answered with a recall query. The agent's effectiveness compounds with the user's investment in the relationship.

Because the value of an AI agent isn't in any single session. It's in the cumulative value of every session the agent has ever had with the user. Memory is what makes that compounding possible.


See the memory architecture documentation for layer configuration, recall query patterns, and the inline learning workflow.

Keep reading

More on Product

View category
Jun 15, 2026Product

Facio's Iteration Budget: How Bounded Reasoning Stops AI Agents From Spiraling Into Costly Loops

An AI agent without a budget is a financial accident waiting to happen. A clever agent that gets stuck in a retry loop, asks the same question 200 times, or follows a confused thread of reasoning for hours can burn thousands of dollars in tokens before a human notices. Facio's iteration budget is a runtime-enforced bound on reasoning length — the architectural pressure that turns "keep trying forever" into "be efficient and escalate when stuck." Here's how bounded reasoning works and why it matters.

Jun 14, 2026Product

Why Facio Is Built in the EU: How DSGVO-Native Architecture Removes Compliance Friction From AI Agents

Most AI agent platforms route customer data through US-hosted infrastructure by default. For European businesses, that's a compliance incident waiting to happen. Facio is built in the EU, for the EU, and the architecture is DSGVO-native from the ground up — data residency, processing boundaries, audit trails, and HITL gating all designed around European data protection requirements. Here's what DSGVO-native actually means and why it matters for production agents.

Jun 13, 2026Product

From Prototype to Production: Facio's Readiness Checklist for AI Agents That Actually Ship

A working AI agent prototype is not a production AI agent. The gap between "it works on my machine" and "it works in production for 247 enterprise customers" is where most agent projects die. Facio's architecture is built around a production readiness checklist — the six pillars that turn a clever demo into a reliable system. Here's what they are, why they matter, and how Facio addresses each.