Back to blog

Product · Jun 25, 2026

Facio's Decision Provenance: How to Explain an AI Agent's Reasoning After the Fact

"Why did the agent do that?" is the question every AI operator eventually has to answer. The answer is rarely obvious from looking at the agent's outputs. Production agents take hundreds of tool calls across complex contexts; reconstructing the reasoning requires structured provenance — what the agent saw, what it knew, what it decided, and why. Facio's decision provenance features turn post-hoc explanation from guesswork into query work. Here's how to make AI agent reasoning auditable after the fact.

Decision ProvenanceExplainabilityAudit TrailObservabilityDebugging

Facio's Decision Provenance: How to Explain an AI Agent's Reasoning After the Fact

"Why did the agent do that?"

Every AI operator asks this question eventually. The customer reports that the agent gave a wrong answer. The compliance auditor wants to know which data the agent processed. The on-call engineer is debugging why the agent deployed the wrong version. The product manager wants to understand the agent's behavior patterns.

The answer is rarely obvious. The agent took 30 tool calls, considered 5 alternative approaches, retrieved context from memory and the web, made decisions about which path to take, and produced an output that was the result of all those steps combined. Reconstructing why requires looking at what was in the agent's context, what the agent's reasoning produced at each step, and what inputs influenced each decision.

Without structured provenance, the answer is guesswork. With structured provenance, the answer is a query.

Facio's decision provenance features turn post-hoc explanation from "I think the agent saw X and decided Y because..." into "I queried the agent's decision log, found the relevant decisions, and here's the chain of reasoning with timestamps and context." Here's how the features work, what they enable, and why provenance is the operational answer to the "why" question.

The Post-Hoc Explanation Problem

Production AI agents are complex systems. A single agent session may involve:

  • 50+ tool calls
  • Multiple memory retrievals
  • Several web searches and fetches
  • File reads, edits, and writes
  • Multiple HITL approval interactions
  • Branching reasoning across dozens of decisions

When something goes wrong — the agent gave a wrong answer, made an inappropriate decision, used outdated information — the operator needs to understand what happened. The naive approach is to read the agent's outputs and try to infer the reasoning. The reality is that the outputs don't contain the reasoning; they only contain the result.

The operator's options without provenance:

  • Ask the agent. "Why did you do that?" The agent may not have a reliable answer; the reasoning that produced the original decision is gone.
  • Read the agent's context. If the context is preserved, the operator can reconstruct what the agent saw. But the context is large (often tens of thousands of tokens) and the reasoning steps aren't marked.
  • Re-run the agent. If the agent is deterministic, re-running produces the same output. But the operator doesn't know which inputs to vary to test hypotheses.
  • Give up and move on. The operator accepts that the agent's behavior is opaque and focuses on damage control rather than understanding.

None of these options are good. Each one is expensive, time-consuming, or insufficient. The operator ends up with an explanation that's a guess, an approximation, or absent entirely.

The Provenance Question

Provenance is the answer. The provenance question, in full:

Given a decision the agent made:
1. What did the agent see? (What was in the context at the time of the decision?)
2. What did the agent know? (What memories, facts, and prior decisions informed the decision?)
3. What did the agent decide? (What was the decision output?)
4. Why did the agent decide it? (What reasoning produced the decision?)
5. What alternatives did the agent consider? (What other approaches were possible?)
6. What was the agent's confidence? (How certain was the agent?)

Each of these questions is answerable if the right data is captured at the right time. The challenge is structuring the capture so the questions can be answered efficiently.

Facio's provenance model captures every decision the agent makes with enough context to answer these questions later. The capture happens automatically, structured by the runtime, and queryable through standard tools.

What Facio Captures

The runtime captures six categories of provenance data for every agent session:

1. Context Snapshots

For each significant decision point (every tool call, every HITL approval, every reasoning milestone), the runtime captures a snapshot of the agent's context:

{
  "timestamp": "2026-06-25T09:42:17Z",
  "iteration": 23,
  "context_snapshot": {
    "system_prompt_hash": "abc123",
    "memory_files": ["MEMORY.md", "USER.md", "SOUL.md"],
    "memory_token_count": 4523,
    "active_context_token_count": 18450,
    "recent_tool_results": [
      {"tool": "web_search", "result_summary": "5 results for 'DSGVO compliance AI agents'", "timestamp": "2026-06-25T09:41:55Z"},
      {"tool": "web_fetch", "result_summary": "Fetched EU AI Act August 2026 enforcement deadline", "timestamp": "2026-06-25T09:42:03Z"}
    ],
    "pending_decisions": ["Synthesize compliance summary", "Choose source citations"]
  }
}

The snapshot tells the operator what was in the agent's working memory at the moment of decision. Not the full context (too large to store per-decision), but the summary of what was there.

2. Decision Records

For every decision the agent makes — what to do next, which tool to use, what to output — the runtime captures a decision record:

{
  "decision_id": "dec-2026-06-25-094217-23",
  "timestamp": "2026-06-25T09:42:17Z",
  "decision_type": "tool_selection",
  "decision": {"tool": "ask_form", "parameters": {"title": "Need compliance details", "fields": [...]}},
  "reasoning_summary": "User asked for compliance summary; need specific use case and jurisdiction to provide accurate guidance",
  "alternatives_considered": [
    {"option": "Use web_search for general compliance info", "rejected_reason": "Too generic; user likely needs specific guidance"},
    {"option": "Proceed with general DSGVO summary", "rejected_reason": "Could be wrong if user is in healthcare vs finance"}
  ],
  "confidence": "high"
}

The decision record captures the decision, the reasoning summary (the agent's stated reason), the alternatives considered, and the agent's confidence. The operator can see why the agent chose what it chose.

3. Tool Call Logs

Every tool call is logged with structured detail:

{
  "call_id": "call-2026-06-25-094155-22",
  "timestamp": "2026-06-25T09:41:55Z",
  "tool": "web_search",
  "parameters": {"query": "DSGVO compliance AI agents"},
  "result_summary": "5 results, top 3 relevant",
  "duration_ms": 1240,
  "status": "success"
}

The tool call log provides the operational detail. The operator can see what tools the agent used, what the inputs were, what the outputs were, how long they took, and whether they succeeded.

4. HITL Records

Every HITL interaction is captured in detail:

{
  "hitl_id": "hitl-2026-06-25-094217-23",
  "timestamp": "2026-06-25T09:42:17Z",
  "type": "ask_form",
  "sent_to": "placet",
  "content": {
    "title": "Need compliance details",
    "fields": [{"key": "use_case", "label": "What's your industry?", "type": "select"}, ...]
  },
  "response": null,
  "response_received_at": null,
  "timeout": "2026-06-25T09:52:17Z"
}

When the human responds, the response is added:

{
  "response": {"use_case": "healthcare", "jurisdiction": "EU"},
  "responder": "user",
  "response_latency_seconds": 142
}

The HITL record captures the request, the human's response (if received), and the timing. The operator can see exactly what the human was asked and what they answered.

5. Reasoning Traces

For significant reasoning steps, the runtime captures the agent's intermediate reasoning:

{
  "trace_id": "trace-2026-06-25-094217-23",
  "timestamp": "2026-06-25T09:42:17Z",
  "step_type": "synthesis",
  "step": "Now that I know the user is in healthcare (EU), I need to focus on the EU AI Act's high-risk classification for medical AI systems...",
  "tokens_used": 234
}

The reasoning trace is the agent's "thinking out loud" captured at decision points. Not every step is traced (that would explode the storage), but the significant ones are.

6. Outcome Records

For the agent's final outputs and significant intermediate outputs, the runtime captures an outcome record:

{
  "outcome_id": "outcome-2026-06-25-094517",
  "timestamp": "2026-06-25T09:45:17Z",
  "type": "user_response",
  "content_summary": "Compliance guidance for healthcare AI in EU, citing EU AI Act and DSGVO specific to medical context",
  "decision_chain": [
    "hitl-2026-06-25-094217",
    "recall-2026-06-25-094245",
    "web_search-2026-06-25-094312",
    "dec-2026-06-25-094340"
  ]
}

The outcome record links back to the decision chain that produced it. The operator can trace from any output back through the decisions that produced it.

Querying Provenance: The Practical Use Cases

The provenance data is only valuable if it's queryable. Facio provides several interfaces for provenance queries:

Use Case 1: Compliance Audit

Question: "For customer X, what data did the agent access and what decisions did it make?"

Query:

recall(query="agent decisions customer X compliance audit", limit=20)
grep(pattern="customer_x", path="memory/", glob="*.jsonl")

Output: A chronological list of every agent decision involving customer X, with the data accessed, decisions made, and human approvals received.

Use Case 2: Debugging Wrong Output

Question: "The agent told the customer to use Stripe instead of Mollie. Why?"

Query:

# Find the relevant decision
grep(pattern="Stripe", path="memory/", glob="decision-*.json")
# Get the context at that decision
recall(query="payment provider recommendation Stripe vs Mollie", limit=5)
# Reconstruct the decision chain

Output: The agent's reasoning chain, showing what context led to the Stripe recommendation (likely outdated payment provider preference in MEMORY.md that didn't reflect current state).

Use Case 3: Customer Support Escalation

Question: "Customer says the agent deployed the wrong configuration. What happened?"

Query:

# Find the deployment decision
grep(pattern="deploy.*configuration", path="memory/", glob="decision-*.json")
# Get the tool call log
grep(pattern="kubectl apply", path="memory/", glob="tool-call-*.json")
# Get the HITL approval
grep(pattern="deploy", path="memory/", glob="hitl-*.json")

Output: The deployment decision, the actual kubectl command executed, and the human approval that authorized it. The customer can see exactly what was approved and when.

Use Case 4: Postmortem Analysis

Question: "Last Tuesday's incident — what was the agent doing in the 30 minutes before?"

Query:

read_logs(level="INFO", since="2026-06-17T14:00:00Z", grep="agent")
# Get decisions and tool calls in the relevant window
recall(query="agent decisions 2026-06-17 incident", limit=30)

Output: A timeline of agent activity leading up to the incident, with decisions and actions.

Use Case 5: Model Performance Analysis

Question: "Are there patterns in when the agent makes confident vs uncertain decisions?"

Query:

# Get all decision records
recall(query="decision records confidence levels", limit=100)
# Analyze confidence patterns over time and context

Output: Aggregated confidence data showing patterns (e.g., "agent is less confident on multi-step workflows" or "agent overconfident on tool selection for unfamiliar MCP servers").

The Provenance Architecture

The provenance architecture is designed to be:

Comprehensive. Every decision, tool call, and HITL interaction is captured. No gaps in the record.

Structured. Each capture has a defined schema. The data is queryable by field, not just by full-text search.

Indexed. The provenance data is indexed in the memory search index. Queries can find relevant records by semantic similarity, not just keywords.

Compressed. The full context snapshots would be too large to store. The runtime captures summaries and hashes, allowing reconstruction when needed but not storing redundantly.

Bounded. The provenance storage has configurable retention (default: 90 days). Old data is archived or purged based on policy. The agent doesn't drown in provenance.

Queryable by humans. The provenance data is designed to be queryable by operators using standard tools (recall, read_logs, grep). No proprietary query language required.

Queryable by the agent itself. The agent can query its own provenance to do self-diagnosis and reflection. The same data serves both human and agent use cases.

What Provenance Doesn't Do

Honest limitations:

  • It doesn't capture the model's internal state. The LLM's neural activations, attention patterns, and internal representations aren't visible. The provenance captures what the agent saw, decided, and did — not how the model's weights produced the decision.
  • It doesn't provide ground truth. The provenance says what the agent did and why. It doesn't say whether the agent's decision was correct. That's still a human judgment.
  • It doesn't replace testing. Provenance helps debug past behavior. It doesn't prevent future bugs. The agent may make the same wrong decision under similar contexts because the model behavior is the same.
  • It can be expensive. Comprehensive provenance consumes storage and processing resources. The cost has to be balanced against the value of being able to answer post-hoc questions.
  • It doesn't help with non-deterministic behavior. If the agent's outputs vary across runs (due to model temperature or randomness), the provenance describes one run, not the possible runs.

The Operational Impact of Provenance

The shift from guesswork to query work has measurable operational impacts:

Faster incident resolution. Without provenance, debugging an agent issue might take hours of reading logs and context. With provenance, the operator queries the relevant records and finds the issue in minutes.

Higher customer trust. When a customer asks "why did the agent do that?", the operator can answer with specifics rather than apologies. The customer sees that the agent's behavior is auditable.

Compliance readiness. Auditors asking "what data did the agent access for customer X?" get a structured answer from the provenance log, not a hand-waving summary.

Continuous improvement. Provenance data feeds back into reflection and improvement. Patterns in agent decisions become visible. Recurring mistakes become identifiable. Fixes become targeted.

Debugging at scale. When multiple agents run multiple workflows, provenance is the only way to debug across the fleet. Per-agent debugging doesn't scale; provenance queries do.

Bottom Line

"Why did the agent do that?" is a question every AI operator eventually has to answer. Without provenance, the answer is guesswork. With provenance, the answer is a query.

Facio's decision provenance features capture the context, decisions, reasoning, tool calls, HITL interactions, and outcomes of every agent session in structured, queryable form. The operator can answer the post-hoc explanation question with specifics: what the agent saw, what it knew, what it decided, why, and what alternatives it considered.

The cost of provenance is storage and processing overhead. The benefit is the ability to understand, debug, and improve agent behavior at scale. For production agents, the benefit outweighs the cost.

Because an AI agent without provenance is a black box. An AI agent with provenance is a system the operator can understand, trust, and improve. Production-grade agents require production-grade observability. Provenance is how Facio delivers it.


See the provenance documentation for query patterns, retention configuration, and integration with observability platforms.

Keep reading

More on Product

View category
Jun 24, 2026Product

Why Your First AI Agent Shouldn't Be Your Most Ambitious: The Facio Approach to Graduated Deployment

The first AI agent workflow you ship determines whether your team ever ships a second one. Most teams make the same mistake: they pick their most ambitious, highest-impact use case for the pilot — and when it fails or underperforms, the team concludes "AI agents don't work for us." Facio's approach is the opposite. Start small, ship the boring workflow, build trust, then expand. Here's the graduated deployment methodology and why the choice of first workflow is the most important product decision you'll make.

Jun 23, 2026Product

How Facio Handles Malformed Input: The Validation Discipline That Keeps AI Agents Production-Ready

An AI agent that crashes on malformed input is a prototype. A production agent handles bad data gracefully — validates, sanitizes, requests clarification, or rejects with clear errors. Facio's input validation discipline gives agents the structural patterns for handling the messy reality of production inputs: missing fields, wrong types, oversize payloads, prompt injection attempts, and ambiguous user requests. Here's how validation works and why it's a non-negotiable for shipping agents.

Jun 22, 2026Product

Facio's Incident Response Playbook: How AI Agents Detect, Triage, and Mitigate Production Issues Autonomously

Production AI agents need an incident response playbook — a structured way to detect issues, triage severity, mitigate damage, and escalate intelligently when human judgment is required. Facio's runtime provides the building blocks: heartbeat-driven monitoring, structured error responses, log queries, HITL escalation, and checkmarked state recovery. Combined, they let agents handle routine incidents autonomously and bring humans into the loop at exactly the right moment. Here's the playbook.