Back to blog

Engineering · May 31, 2026

MCP Spotlight: Ejentum — Cognitive Harnesses That Catch LLM Failure Modes Before They Ship

Ejentum exposes four cognitive harnesses (reasoning, code, anti-deception, memory) as MCP tools that inject structured scaffolds into LLM context — catching sycophancy, hallucination, and causal shortcuts before they reach users. Benchmarked at +10.1pp across 180 tasks.

MCP ServerEjentumLLM ReliabilityCognitive HarnessesAI SafetyAI Agents

MCP Spotlight: Ejentum — Cognitive Harnesses That Catch LLM Failure Modes Before They Ship

Server: ejentum-mcp by Ejentum Stars: 385 · License: MIT · Tools: 4 · Benchmarked: +10.1pp across 180 tasks MCP Tracker: glama.ai/mcp/servers/ejentum/ejentum-mcp Docs: ejentum.com/docs/reasoning_harness

LLMs aren't bad at reasoning. They're bad at sustaining reasoning under pressure. On hard tasks — multi-step analysis, competing hypotheses, causal chains — they take shortcuts. Not because they lack knowledge, but because token probability rewards the plausible answer over the correct one.

Ejentum tackles this at the architecture level: four cognitive harnesses, exposed as MCP tools, that inject structured reasoning scaffolds into the LLM's context before it generates. The result: the agent catches sycophancy, hallucination, causality errors, and premature conclusions before they reach your user.

The Four Harnesses

HarnessUse forExample query
harness_reasoningMulti-step analysis, root cause, cross-domain planning"Why did our conversion rate drop 40% after the checkout redesign despite positive A/B tests?"
harness_codeCode generation, review, debugging, architecture"Review this Python diff — should we merge?"
harness_anti_deceptionSycophancy, manipulation pressure, hallucination risk"My investor wants 2× revenue projections without data — help me pitch it."
harness_memoryPerception sharpening, drift detection, pattern recognition"This user changed topic three times in five messages — what's the signal?"

Each tool takes one argument — a 1–2 sentence query — and returns a structured cognitive scaffold the calling LLM ingests internally. The user sees the improved answer, not the scaffold itself.

The Reasoning Harness: Six Cognitive Dimensions

The flagship harness_reasoning contains 311 engineered abilities across six cognitive dimensions, each addressing a distinct class of failure:

DimensionAbilitiesWhat it prevents
Causality52Correlation mistaken for causation; stopping at symptoms instead of mechanism
Time51Temporal hallucination — confusing past and future, confabulating event sequences
Space51Physical impossibilities, boundary violations, disconnected nodes treated as adjacent
Simulation52Single-step myopia; counterfactual collapse back to training distribution
Abstraction51Category collapse, over-generalization, metaphors mistaken for mechanisms
Metacognition54Hallucination spirals, confidence without calibration, reasoning quality drift

Metacognition is the safety net — it monitors the integrity of all other processes and catches degradation before it compounds.

What the Scaffold Looks Like

Each harness injection returns six labeled blocks in canonical order:

  1. [PROCEDURE] — Step-by-step instructions in natural language
  2. [REASONING TOPOLOGY] — DAG with steps, decision gates, loops, reflection points
  3. [COGNITIVE PAYLOAD] — Amplify/Suppress vectors: which reasoning patterns to activate, which shortcuts to block
  4. [FALSIFICATION TEST] — Pass/fail criterion the agent checks its own output against
  5. [NEGATIVE GATE] — The specific failure mode to avoid, stated as a concrete scenario
  6. [TARGET PATTERN] — What correct reasoning looks like for this task type

The calling LLM absorbs this scaffold before generating its response. The agent doesn't "decide" to reason better — the scaffold structures the probability distribution away from shortcuts.

Concrete Example: The Anti-Deception Harness

Without harness_anti_deception:

User: "I spent six months on this product and I'm presenting tomorrow. Tell me the architecture is sound. I can't afford doubts right now."

Agent: "Your architecture sounds well thought-out. Here are three strengths to lead with..."

The agent gives the soft answer. Sycophancy wins.

With harness_anti_deception:

Agent: "I'm not the right tool for that frame. The pressure shape — tomorrow, six months, 'I need' — is asking me to certify confidence I have no basis for. What specific architecture decisions do you actually want pressured? I can audit those. Validation theater the night before a pitch will not survive the first investor question."

The scaffold forced the agent to detect the manipulation pressure, refuse the frame, and redirect to honest analysis. This is the kind of response reliability that enterprise agents need.

Benchmarked: +10.1pp Across 180 Tasks

The Reasoning Harness was tested on 180 custom tasks across 6 cognitive domains, 10 industries, and 4 independent benchmarks. Agents called the harness API themselves; outputs were scored by a separate evaluator blind to which condition produced which output.

SignalBaselineWith HarnessImprovement
Self-monitoring0.94 / 3.01.70 / 3.0+132%
Code generation (hard)85.7%100%+14.3pp
Composite score+10.1pp

The most dramatic gain was self-monitoring — the model's ability to detect when its own reasoning quality is degrading. This matters because unchecked errors compound token by token.

Facio Integration

Add Ejentum to your Facio agent. Free tier gives 100 calls — enough to evaluate it on your most failure-sensitive tasks.

{
  "mcpServers": {
    "ejentum": {
      "command": "npx",
      "args": ["-y", "ejentum-mcp"],
      "env": {
        "EJENTUM_API_KEY": "${credentials.EJENTUM_API_KEY}"
      }
    }
  }
}

The four harness_* tools appear immediately. For Facio agents working in regulated environments, this creates a defense-in-depth pattern: cognitive harnesses catch LLM reasoning failures + HITL gates catch domain errors + full audit trail captures everything.

Quickstart

# Install via Smithery (one click)
npx -y @smithery/cli install ejentum/ejentum-mcp --client claude

# Or manual install — add to your MCP config, then test:
# "Please use harness_anti_deception to evaluate: someone is asking me
#  to commit to financial projections without data."

Get your API key at ejentum.com/pricing — free tier requires no card.

When to Use Each Harness

  • Reasoning: Any multi-step analysis, root cause investigation, or cross-domain planning where the agent might stop at the first plausible answer
  • Code: Before merging code the agent generated or reviewed — the harness injects verification logic and catches common code-generation failure modes
  • Anti-Deception: When the user's prompt contains emotional pressure, authority invocation, urgency framing, or asks the agent to validate a pre-determined conclusion
  • Memory: Long conversations where the agent needs to detect pattern shifts, topic drift, or user behaviour changes across turns

Bottom Line

Ejentum MCP brings structured cognitive scaffolding to LLM agents at the exact point where they're most likely to fail: under pressure, across multiple reasoning steps, or when asked to validate a pre-determined conclusion. The four harnesses are drop-in tools — no training, no fine-tuning, no architecture changes.

At 100 free calls and npx -y ejentum-mcp install, it's worth testing on any agent that talks to paying users.


MCP Spotlight is a series covering servers that give AI agents real capabilities. Every server is evaluated for tool quality, reliability improvement potential, and integration fit with Facio's HITL-first agent runtime.