Back to blog

Engineering · May 28, 2026

MCP Spotlight: Aegis DQ — Agentic Data Quality With a Full Audit Trail for Every AI Decision

Aegis DQ gives agents 31 rule types, 6 warehouse adapters, and LLM-powered root-cause analysis with a searchable audit trail for every decision. From policy docs to diagnosed violations in minutes — at $0.01 per run.

MCP ServerAegis DQData QualityAudit TrailComplianceAI Agents

MCP Spotlight: Aegis DQ — Agentic Data Quality With a Full Audit Trail for Every AI Decision

Server: aegis-dq by Aegis DQ Stars: 1.2k · License: Apache 2.0 · Latest: v0.7.0 MCP Tracker: glama.ai/mcp/servers/aegis-dq/aegis-dq

Data quality tools are nothing new — but an agentic data quality framework that an AI agent can drive end-to-end? That's a different proposition. Aegis DQ is an open-source framework that goes from business policy documents to diagnosed data violations with LLM root-cause analysis, auto-generated SQL fixes, and — critically — a searchable, per-decision audit trail.

What It Does

Point Aegis at your business docs (policies, schema definitions, SLAs) and your warehouse. It reads the context, generates executable validation rules, runs them across your data, and diagnoses every failure. The results: plain-English root cause, severity tier, impacted regulation, and remediation SQL.

The real-world AML demo says it all: 12 BSA/OFAC policy docs → 55 rules generated → 11 violations detected → all diagnosed → $0.01 total LLM cost with Claude Haiku.

Key Features

CategoryDetails
Rule types31 — completeness, uniqueness, validity, referential integrity, statistical, timeliness, cross-table reconciliation, and ML anomaly detection (Z-score, Isolation Forest)
Warehouse adapters6 — DuckDB, Postgres/Redshift, BigQuery, Databricks, AWS Athena, Snowflake
LLM providersAnthropic Claude, OpenAI, Ollama (local, zero-cost), AWS Bedrock
Pipeline5-node LangGraph: plan → parallel validation → LLM diagnose → RCA → SQL remediate → report
IntegrationsGitHub Actions (CI/CD gate), Airflow, dbt, Hermes MCP, REST API
Audit trailFull-text searchable log of every LLM decision with cost and latency

The Agentic Pipeline

Every aegis run pushes your data through a five-stage LangGraph pipeline:

rules (Python / YAML)
    │
    ▼
  plan ──► parallel_table ──► reconcile ──► remediate ──► report
                 │
         ┌──────────────────┐
         │  per table:      │
         │  execute         │
         │  classify        │
         │  diagnose        │  ← concurrent across all tables
         │  rca             │
         └──────────────────┘
  1. Plan — parse and validate rules, build the execution graph
  2. Parallel execution — fan out per table: run all rules, classify failures by severity, diagnose with LLM, trace root causes
  3. Reconcile — compare results against expected thresholds
  4. Remediate — LLM proposes targeted SQL fixes for each diagnosed failure
  5. Report — structured JSON output, optional Slack notification

Why the Audit Trail Matters

This is where Aegis DQ stands apart from tools like Great Expectations or Monte Carlo. Every LLM decision — raw diagnosis, root cause analysis, and remediation proposal — is logged with:

  • The exact prompt sent and response received
  • Model used, token count, latency, and cost
  • Full-text search via aegis audit search <query>
  • Per-run trajectory inspection via aegis audit trajectory <run-id>

For compliance-heavy environments (AML, GDPR, SOX, FDA), this is the difference between "the AI said the data was fine" and provable, auditable evidence of what the AI did, why it did it, and what it cost.

At Facio, we've built our entire platform around this principle: if an agent makes a decision, you need to know exactly how and why. Aegis DQ applies that same standard to data quality — and that's why it earned this spotlight.

Conversational Data Quality via MCP

Aegis DQ integrates with Hermes (and any MCP-compatible agent) through its MCP server. Once connected, you can run validation pipelines conversationally:

# ~/.hermes/config.yaml
mcp_servers:
  aegis:
    command: aegis
    args: [mcp]
    env:
      ANTHROPIC_API_KEY: "${ANTHROPIC_API_KEY}"

Define a pipeline manifest once:

# pipeline.yaml
name: orders-dq
rules: ./rules.yaml
database: ./warehouse.duckdb
kb:
  - ./policy.md
  - ./schema.md
goal: |
  Run all rules. For every failure explain the business impact,
  likely root cause, and a concrete remediation step.

Then just ask your agent: "Load the pipeline at pipeline.yaml and run it."

Hermes picks it up via MCP, Aegis executes the full pipeline, and you get a structured report — no re-explaining context on every run.

Facio Integration

Add Aegis DQ to your Facio agent with the standard MCP server config. Facio's built-in audit trail already captures every tool call and agent decision; adding Aegis DQ extends that traceability to your data validation layer.

{
  "mcpServers": {
    "aegis-dq": {
      "command": "aegis",
      "args": ["mcp"],
      "env": {
        "ANTHROPIC_API_KEY": "${credentials.ANTHROPIC_API_KEY}"
      }
    }
  }
}

Available MCP tools include: load_pipeline, run_validation, generate_rules, audit_trajectory, audit_search, get_rule_templates, and list_warehouses.

Quickstart

pip install aegis-dq

# Seed a demo database
python -c "
import duckdb
con = duckdb.connect('demo.db')
con.execute(\"CREATE TABLE orders AS SELECT i AS order_id, 'placed' AS status, i * 9.99 AS revenue FROM range(1, 10001) t(i)\")
con.execute(\"UPDATE orders SET revenue = -5.00 WHERE order_id % 500 = 0\")
con.close()
"

# Generate rules from schema (no hand-writing)
export ANTHROPIC_API_KEY=sk-ant-...
aegis generate orders --db demo.db --output rules.yaml

# Run — diagnoses every failure with root cause
aegis run rules.yaml --db demo.db

Or run fully offline with zero LLM cost:

aegis run rules.yaml --db demo.db --no-llm

Bottom Line

Aegis DQ brings three things to the agentic data stack that were previously fragmented across commercial tools: LLM-powered root cause analysis with auto-generated SQL fixes, a fully searchable audit trail for every AI decision, and a conversational MCP interface that lets your agent handle data quality end-to-end. At $0.01 per run and Apache 2.0 licensing, the price is right too.

If your agents touch data that matters — financial, regulated, customer-facing — this belongs in your MCP stack.


MCP Spotlight is a series covering servers that give AI agents real capabilities. Every server is evaluated for tool quality, auditability, and integration fit with Facio's HITL-first agent runtime.