Facio's Inline Learning: How Agents Update Their Own Memory During Every Conversation
Most AI agents are stateless between sessions. They might have access to a RAG pipeline or a vector database, but their "knowledge" of you — your preferences, your project context, your past decisions — disappears the moment the conversation ends. Every new session starts from scratch, with the human re-explaining what they need and how they work.
Facio takes a different path. Agents don't just retrieve from memory — they write to it, in real time, during conversations. And they do it with surgical precision: not dumping entire conversation logs into a database, but making targeted, deduplicated edits to specific memory files. Here's how the architecture works and why it produces better long-term behavior than retrieval-only approaches.
The Problem with Retrieval-Only Memory
The standard approach to agent memory in 2026 is some form of retrieval-augmented generation: store conversation data in a vector database, query it with semantic search, and inject relevant chunks into the context window when needed. This works for factual recall — "what did we decide about the deployment schedule?" — but it has structural weaknesses:
- No consolidation. Raw conversation chunks retrieved by similarity don't distinguish between a decision that was made, a hypothesis that was disproven, and a passing comment. Everything looks the same to the vector index.
- No deduplication. If a user corrects the agent three times about the same preference, the vector database stores all three corrections as separate entries. Retrieval may surface an outdated correction alongside the current one.
- No active learning. The agent never updates its own knowledge state. It always depends on retrieval quality — which degrades as the corpus grows and noise accumulates.
RAG gives agents access to the past. It doesn't give them the ability to learn from it.
Facio's Two-Layer Memory Architecture
Facio splits memory into two layers that serve different purposes:
| Layer | What it is | How it works |
|---|---|---|
| Passive context | MEMORY.md, USER.md, BUGS.md — markdown files included in every conversation's context window | Always available, always current, human-readable |
| Active recall | search.db — a SQLite search index with BM25 and semantic search | On-demand, queried with the recall tool when passive context isn't enough |
The passive context layer is where inline learning happens. These aren't database records — they're plain markdown files that the agent manages with edit_file (surgical changes) and write_file (full rewrites). The agent reads them, understands them, and — critically — updates them when it learns something new.
When Agents Write to Memory
Inline learning isn't "log everything." The agent follows explicit trigger conditions for when to update memory:
Explicit instructions. The user says "remember this," "always do X," or "never do Y." The agent immediately finds the right file and makes the edit. No Reflection pass needed, no batch processing — it happens in the same turn.
Corrections. The user corrects the agent's approach or points out a mistake. The agent adds the lesson to MEMORY.md so the same error doesn't recur in future sessions.
Preference discovery. The agent observes a user preference — coding style, communication style, tooling choices — and updates USER.md. "Prefers German responses" becomes a durable fact, not something the human has to re-state.
Environment discovery. The agent learns something about the OS, installed tools, or project conventions and records it in MEMORY.md. "Docker compose v2.24.6 installed, GPU available" — learned once, remembered forever.
Bug discovery. When a bug, limitation, or workaround becomes relevant, the agent reads BUGS.md and updates it if the bug isn't already documented. This is the agent maintaining its own known-issues tracker — without a human filing tickets.
The key architectural decision: the agent writes immediately, not during a nightly batch. By the time the next conversation starts, the knowledge is already in passive context — no retrieval latency, no stale index.
Surgical Editing: Why edit_file Matters
The edit_file tool is the primary mechanism for inline learning. Unlike write_file, which replaces entire files, edit_file makes targeted replacements — find old_text, replace with new_text. This keeps memory files clean and prevents the agent from accidentally destroying unrelated context.
edit_file(
path="memory/MEMORY.md",
old_text="## Important Notes\n",
new_text="## Important Notes\n- User prefers git over mercurial for all new projects\n"
)
The rules the agent follows:
- Keep entries concise and actionable — one line per fact, no prose
- Deduplicate — if the fact already exists, skip or update in place
- Don't save trivial information — inline learning is for durable facts, not session state
- Don't save code blocks or logs — memory files stay small enough to fit in passive context
The result: MEMORY.md grows organically with genuinely useful information, not noise. After weeks of conversations, it's a curated summary of everything that matters — not a dump of everything that happened.
The Role of Reflection
Inline learning handles immediate, explicit knowledge. But there's a second mechanism that runs periodically: Reflection.
Reflection is a background process that reviews recent conversations, detects patterns, consolidates related facts, and removes stale or outdated entries. It's the housekeeping layer — the agent doesn't wait for Reflection to learn something important, but Reflection ensures the knowledge base doesn't accumulate cruft over time.
The split is deliberate:
- Inline learning = immediate, trigger-driven, surgical. For facts the agent knows it needs to remember right now.
- Reflection = periodic, pattern-detecting, consolidating. For keeping the knowledge base clean across dozens of conversations.
Together, they produce a memory system that's both responsive (inline learning captures every correction immediately) and self-maintaining (Reflection prevents accumulation of stale or redundant information).
What This Looks Like in Practice
A real interaction:
- Human: "Actually, I always want blog posts published immediately, not saved as drafts."
- Agent: Reads
USER.md, finds no existing preference about blog publishing. Usesedit_fileto add:- Blog posts: publish immediately, never save as draft - Agent: Confirms: "Noted — I'll publish blog posts directly from now on."
- Next session, weeks later: The agent reads
USER.mdin passive context. It sees the preference. It publishes the post directly — no retrieval step, no "did the user say something about drafts?" ambiguity.
The loop closed in one turn. The knowledge became durable immediately. And it survived across conversations because it lives in a file that's always loaded.
Why This Beats Pure RAG
Consider the same scenario with a pure RAG system:
- Human gives the preference.
- The conversation is chunked and embedded.
- Next session, the agent queries: "user preference blog draft publish"
- Maybe the vector search returns the right chunk. Maybe it returns a different conversation where "draft" and "blog" overlapped but the context was different.
- The agent reads a raw conversation chunk, tries to extract the preference from conversational context, and may or may not get it right.
In Facio, the agent wrote a clean, deduplicated fact file. Next session, that fact is loaded deterministically — no retrieval ambiguity, no embedding drift, no context pollution from adjacent conversations.
Integration with the Search Index
What about information that doesn't fit in passive context? That's where the recall tool comes in.
MEMORY.md is kept small enough to load into every conversation's context window — typically a few hundred lines. But the search index (search.db) covers ALL past conversations, including those whose details didn't make it into the curated memory file.
When the agent needs to remember something from a conversation three months ago — "which deployment approach did we decide on for the staging environment?" — it queries with recall(query="staging deployment decision"). The search index combines BM25 keyword matching with semantic similarity for robust retrieval across time and topic drift.
The passive layer handles the always-relevant facts. The search layer handles the deep archives. Both are updated by the same inline learning pipeline — the agent writes to the file, the search index updates automatically, and the knowledge is findable by both paths.
Bottom Line
Memory in AI agents isn't just about storing information — it's about knowing what to store, when to store it, and how to keep it current. Most systems solve the storage problem and call it done.
Facio's inline learning solves the harder problem: agents that recognize when they've learned something important, record it immediately in a structured, deduplicated format, and carry that knowledge forward into every future conversation — without the human having to manage a knowledge base or re-explain themselves every session.
When an agent can update its own understanding in real time, and those updates survive across conversations, you stop having to repeat yourself. The agent doesn't just remember what you said — it remembers what you meant.
See the memory documentation for inline learning trigger conditions, Reflection configuration, and search index architecture.