Back to blog

Engineering · Jul 5, 2026

MCP Spotlight: Fetch MCP Server — The Anthropic Reference for Safe HTML-to-Markdown Web Fetching With URL Allow-Lists and the Read-Only Default

The official Fetch MCP Server by Anthropic — one tool (fetch), read-only by design, HTML→Markdown conversion with raw/JSON modes. MIT-licensed, available as NPX package or Docker image. The read-only web access default for AI agents in 2026.

MCP ServerFetchAnthropicRead-OnlyMarkdownAI Agents

MCP Spotlight: Fetch MCP Server — The Anthropic Reference for Safe HTML-to-Markdown Web Fetching With URL Allow-Lists and the Read-Only Default

Server: @modelcontextprotocol/server-fetch by Anthropic License: MIT · Tools: 1 (fetch) · Transport: stdio or Docker Output: HTML → Markdown conversion (with raw/JSON options) Security model: Read-only · URL allow-list via content-policies · Per-call domain filtering GitHub: github.com/modelcontextprotocol/servers/tree/main/src/fetch PyPI: pypi.org/project/mcp-server-fetch/ MCP Tracker: glama.ai/mcp/servers/modelcontextprotocol/fetch

Every agent eventually needs to read the web. The "give the agent a headless browser and let it script its way around" approach is overkill for read-only workflows. The "give the agent nothing" approach blocks the long tail of use cases that benefit from live web data — research, doc lookups, current event awareness, competitive intelligence. The "give the agent raw curl" approach dumps HTML into context without cleaning it up.

The official Fetch MCP Server by Anthropic is the bridge that resolves this. One tool (fetch), read-only by design, takes a URL and returns Markdown. Built directly on markdownify (or equivalent HTML→MD converters), with optional raw and json modes for cases where the agent needs the original HTML or the structured JSON response. MIT-licensed. Maintained by Anthropic.

This is the read-only web access default for AI agents in 2026. Small surface, strong guarantee, ubiquitous use.

The Single Tool: fetch

The MCP surface is one tool. The tool's input is a URL and options; the output is cleaned Markdown:

{
  "url": "https://docs.stripe.com/mcp",
  "max_length": 50000,
  "start_index": 0,
  "raw": false
}
ParameterPurpose
urlThe URL to fetch (required)
max_lengthMaximum length of returned content (default 5000 chars)
start_indexCharacter offset for pagination (default 0)
rawIf true, return raw HTML instead of Markdown

The result is the page content, converted to clean Markdown. Headings, lists, code blocks, tables, and links all preserved. Navigation, ads, scripts, and styling stripped. The agent gets exactly what it needs to reason over.

For long pages, the start_index parameter enables pagination:

1. fetch("https://docs.stripe.com/mcp", max_length=20000)
   → Returns first 20,000 chars + indication of total length
2. fetch("https://docs.stripe.com/mcp", max_length=20000, start_index=20000)
   → Returns next 20,000 chars
3. Continue until done

The agent doesn't have to know the page length up front; the response tells it.

The Markdown Conversion: Why It Matters

Raw HTML is unusable for LLM context:

<html>
  <head>
    <link rel="stylesheet" href="...">
    <script src="..."></script>
  </head>
  <body>
    <header>
      <nav>
        <ul>
          <li><a href="/">Home</a></li>
          <li><a href="/docs">Docs</a></li>
        </ul>
      </nav>
    </header>
    <main>
      <h1>Stripe MCP Server</h1>
      <p>The MCP server is the bridge...</p>
      <pre><code class="language-bash">npx -y @stripe/mcp</code></pre>
    </main>
  </body>
</html>

That's ~600 tokens of HTML for ~50 tokens of useful content. The Fetch MCP server converts this to Markdown:

# Stripe MCP Server

The MCP server is the bridge...

```bash
npx -y @stripe/mcp

Now it's ~50 tokens of useful content. **~12x token efficiency for the same information.** Across a 100-page research task, this is the difference between context exhaustion and a productive session.

The Markdown converter also:

- **Strips ads and tracking** — no marketing noise
- **Preserves code blocks** with language hints
- **Preserves tables** as Markdown tables
- **Preserves links** as Markdown links
- **Handles relative URLs** (resolves to absolute)
- **Converts HTML entities** (e.g., `&amp;` → `&`)

For agents doing doc-aware tasks ("read the Stripe docs, then integrate Stripe Checkout"), the Fetch MCP server turns the read step from "parse HTML" to "read clean Markdown."

## The Read-Only Guarantee

The Fetch MCP server is **read-only by design**. It doesn't expose:

- `POST`, `PUT`, `PATCH`, `DELETE` — no mutation
- Cookie/session management — no authenticated state
- Custom headers — no way to send API keys
- Body parameters — no way to POST data

This is the **safety-by-design** pattern. The agent can fetch any URL, but it can never mutate a remote system through this server. The blast radius is bounded: a malicious prompt or hallucinated tool call can read the most embarrassing docs on the internet, but it can't post to your CMS, delete your records, or send emails.

For multi-tenant or regulated environments, this is the right primitive to expose by default. **Read access is universally useful; write access needs per-server authorization.**

## The Docker Image: Zero-Install

Anthropic publishes the Fetch MCP server as an official Docker image:

```bash
docker run -i --rm mcp/fetch

The Docker image:

  • Has no external dependencies (Python + the converter library baked in)
  • Runs as a non-root user
  • Resets state per invocation (no persistent filesystem)
  • Works behind the Docker MCP Gateway

For teams running MCP via the Docker Toolkit, this is a one-line addition to the profile:

docker mcp profile add-server dev-tools --server fetch

The agent instantly has web access with zero npm install, zero Node version conflicts, zero dependency surface.

The Allow-List Pattern

For regulated environments, the Fetch MCP server can be paired with allow-list policies at the gateway or Facio level:

ServerAllowed Domains
fetch-docsdocs.stripe.com, docs.anthropic.com, docs.github.com, developer.mozilla.org
fetch-researcharxiv.org, wikipedia.org, *.gov, *.edu
fetch-internalwiki.acme.com, jira.acme.com, confluence.acme.com

Multiple Fetch servers, each scoped to a specific domain allow-list. The agent reads from the right Fetch server based on the task. Bounded scope, bounded blast radius, full audit trail per server.

The Curl Alternative: When to Use Fetch vs. Browser

For the 95% of agent use cases that just need to read web content, the Fetch MCP server is the right primitive. For the 5% that need JavaScript rendering, login flows, or interactive UI, the Playwright MCP server is the right primitive.

The decision tree:

Need to read web content?
  → Static HTML / docs / API responses / articles?
    → Use Fetch MCP (one tool, Markdown output, cheap)
  → JavaScript rendering / login flow / SPA / interactive UI?
    → Use Playwright MCP (20+ tools, browser automation, expensive)

Fetch is the lightweight default. Playwright is the heavyweight escape hatch. Most agents need only Fetch.

Facio Integration

{
  "mcpServers": {
    "fetch": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-fetch"]
    }
  }
}

Facio's audit trail captures every Fetch call with the URL, the response status, the byte count, the conversion mode, and the page title. For a regulated team (financial research, competitive intelligence, compliance audits), this is the complete web-research record: "Agent at 14:32 UTC fetched docs.stripe.com/mcp, 47KB HTML, 5KB Markdown returned, status 200."

For HITL workflows, the Fetch MCP server is the lowest-friction read tool in the ecosystem:

ServerCapabilitiesSuggested Gate
fetch (default)Read any URL, return MarkdownNone — autonomous
fetch-researchRead allow-listed research sourcesNone — autonomous
fetch-internalRead internal wiki, Jira, ConfluenceNone — autonomous (already gated by network)

Because the Fetch server is read-only by design, no destructive-hint annotations are needed. The agent can fetch as much as it wants; it can't break anything.

The interesting HITL pattern is pre-fetch confirmation for sensitive domains. For example, Facio can be configured to:

  • Soft-confirm fetches to *.gov — government sites sometimes have rate limits
  • Soft-confirm fetches to social-media-*.com — terms-of-service concerns
  • Block fetches to internal IPs (10.*, 192.168.*, 172.16.*) — security boundary
  • Block fetches to localhost — SSRF prevention
  • Block fetches to file:// URLs — local file disclosure

For multi-environment setups (research / production / dev), the pattern is multiple Fetch servers, each with its own allow-list policy. The agent switches context per environment.

Quickstart

# Option 1: Docker (recommended)
docker run -i --rm mcp/fetch

# Option 2: NPX
npm install -g @modelcontextprotocol/server-fetch
npx -y @modelcontextprotocol/server-fetch

Configuration:

{
  "mcpServers": {
    "fetch": {
      "command": "docker",
      "args": ["run", "-i", "--rm", "mcp/fetch"]
    }
  }
}

First prompts:

"Fetch the Stripe MCP documentation and summarize the OAuth 2.0 flow"
"Get the current Stripe pricing page and tell me the new tier names"
"Fetch the Anthropic API reference for the /v1/messages endpoint"
"Read the latest RFC 9110 (HTTP Semantics) and find the section on caching"

Use Cases

Doc-aware integration: "Help me integrate Stripe Checkout. Read the current docs and generate code." Agent fetches docs.stripe.com → reads Markdown → generates integration code with current parameters.

Competitive intelligence: "Compare the pricing pages of our top 3 competitors. Summarize the differences." Multi-URL fetch → comparative analysis → structured report.

Research synthesis: "Read the top 10 Google Scholar results for 'multi-agent LLM systems' and summarize the key approaches." Multi-URL fetch → synthesis → structured summary.

Current event awareness: "What's the latest on the EU AI Act? Fetch the official .gov source and summarize." Single-URL fetch → summary.

Documentation drift detection: "Our docs say X, but does the upstream library still support X? Fetch the current docs and compare." Version-aware fetch → diff → report.

API reference lookup: "I forgot the exact parameter for Stripe's create_payment_intent API. Fetch the docs and tell me." Single-URL fetch → reference lookup.

News aggregation: "Fetch the top 5 tech news sites and tell me the most-discussed story today." Multi-URL fetch → sentiment aggregation → top story.

Compliance research: "What does the GDPR say about data retention for SaaS providers? Fetch the official EUR-Lex source." Authoritative source fetch → compliance summary.

Documentation scraping for AI training: "Fetch our entire /docs site and produce a structured index for the AI's context window." Multi-URL fetch → index building.

Pricing monitoring: "Fetch the competitor's pricing page daily. Alert me if the price of the Pro plan changes." Scheduled fetch → diff → alert.

Status page monitoring: "Fetch our status page and report any active incidents." Status page fetch → incident report.

Library migration: "The moment library is deprecated. Fetch the date-fns docs and tell me the equivalents for our top 5 usage patterns." Cross-library fetch → migration guide.

Schema documentation: "Fetch our OpenAPI spec from /openapi.json and generate Markdown documentation." Structured fetch → doc generation.

Reference checking: "Cite a quote from the EU AI Act. Fetch the exact article to verify." Authoritative source fetch → verification.

Live data feeds: "Fetch today's exchange rates from a public API and convert $100 to EUR." API fetch → conversion → response.

The Read-Only Default

The Fetch MCP Server is the read-only web access default for AI agents in 2026. One tool (fetch), MIT-licensed, official Anthropic-maintained, Docker image available, Markdown-first output. The right primitive for the 95% of agent workflows that need to read the web without writing to it.

For any agent that participates in doc-aware work — integration, research, competitive intel, compliance verification, library migration — this is the bridge. The agent fetches the URL, reads clean Markdown, and acts on the content. No HTML parsing, no JavaScript rendering, no auth flows, no mutation surface.

For the broader MCP ecosystem, the Fetch pattern is the design lesson every "read-only" MCP server should copy. The smallest useful surface is the safest surface. One tool, clear input, clear output, read-only by design. The agent can use it freely because it can't break anything.

docker run -i --rm mcp/fetch (or npx -y @modelcontextprotocol/server-fetch) and your agent has clean web access.


MCP Spotlight is a series covering servers that give AI agents real capabilities. Every server is evaluated for design clarity, ecosystem impact, and integration fit with Facio's HITL-first agent runtime.