Back to blog

Product · Jun 4, 2026

Facio's Multi-Provider Architecture: How switch_model Enables Dynamic Model Routing

Locking an AI agent to a single model provider is like locking a developer to a single programming language — it works until it doesn't. Facio's switch_model tool lets agents change LLM providers mid-conversation with human approval, enabling cost-optimized routing, provider fallback, and capability-aware task delegation across OpenAI, Anthropic, Google, OpenRouter, and any OpenAI-compatible endpoint.

Multi-ProviderModel SwitchingModel RoutingLLM ProvidersCost Optimization

Facio's Multi-Provider Architecture: How switch_model Enables Dynamic Model Routing

Most AI agent platforms bind you to a single model provider at setup time. You pick OpenAI or Anthropic or Google, configure an API key, and that's what the agent uses for every task — from simple data extraction to complex multi-step reasoning.

The problem with single-provider lock-in is threefold:

  1. Cost inefficiency. A $15/1M-token reasoning model is overkill for "extract the date from this paragraph." A $0.15/1M-token model is underpowered for "analyze this legal contract for liability clauses." But with a single provider, you pay one rate for everything.
  2. No fallback. When your provider has an outage — and every provider has outages — your agent stops working. No automatic failover, no degraded mode, just silence.
  3. Capability gaps. Different models excel at different things. Claude writes better prose. GPT-4o is faster at structured extraction. Gemini has longer context windows. Locking to one means accepting its weaknesses alongside its strengths.

Facio's switch_model tool solves all three by making provider and model selection a runtime decision — not a startup configuration. Here's how the architecture works.

The Architecture: Provider-Neutral Runtime, Swappable Models

Facio's runtime is provider-neutral. The agent doesn't know or care which LLM is behind the current session — it calls the same tools, follows the same instructions, and produces the same output format regardless of whether the underlying model is Claude, GPT, Gemini, or an OpenRouter-routed model.

The switch_model tool exposes this neutrality to the agent:

switch_model(action="switch", model="claude-sonnet-4-7")

The agent proposes a switch. The human approves (required for every model change). The switch takes effect immediately for all subsequent messages in the session. No restart. No reconfiguration. No context loss.

The model parameter accepts fuzzy matching — the agent can say "claude-sonnet", "gpt-4o", "gemini flash", or "opus 4.7" — and the runtime resolves to the closest matching model across all configured providers.

Provider Discovery: What's Available Right Now?

Before switching, the agent needs to know what's available. The list action surfaces all configured providers and their models:

switch_model(action="list")

The runtime returns every provider with a configured API key — OpenAI, Anthropic, Google, OpenRouter, and any OpenAI-compatible custom endpoints. The agent sees the full menu and can make an informed routing decision.

This matters because API keys are configured via the credential store — the agent can check manage_credentials(action="list") to see which providers are set up, then switch_model(action="list") to see their available models. The agent discovers its own capability surface without hardcoding provider names or model IDs.

Three Patterns That Make Model Switching Production-Ready

Pattern 1: Cost-Optimized Task Routing

The agent analyzes the task complexity and routes to the cheapest capable model:

# Simple data extraction → cheapest capable model
switch_model(action="switch", model="gpt-4o-mini")
# ... extract pricing data from 50 product pages ...

# Complex reasoning → flagship model
switch_model(action="switch", model="claude-opus-4-7")
# ... analyze competitive positioning across extracted data ...

The same session uses two different models at two different price points — without the human manually editing a config file between tasks. The agent made the routing decision, the human approved the switches, and the total token spend dropped by routing simple work to the cheap model.

Pattern 2: Provider Fallback

When the primary provider returns errors or times out, the agent falls back to an alternative:

# Primary: Anthropic
switch_model(action="switch", model="claude-sonnet-4-7")
# ... Anthropic returns 503 ...

# Agent detects the failure, switches to fallback
switch_model(action="switch", model="gpt-4o")
# ... continues work on OpenAI ...

# Later, when Anthropic is back:
switch_model(action="switch", model="claude-sonnet-4-7")

This is provider-agnostic resilience. The agent doesn't need a separate fallback configuration, a load balancer, or an ops team on call. It detects the failure at runtime, proposes a switch, the human approves, and work continues.

Pattern 3: Capability-Aware Task Delegation (with Spawn)

Combined with spawn, model switching enables capability-aware parallel execution:

# Parent agent stays on Claude for orchestration

# Sub-agent 1: Code generation → Claude (strongest at code)
spawn(task="Refactor the authentication module", model="claude-sonnet-4-7")

# Sub-agent 2: Document summarization → Gemini (longest context window)
spawn(task="Summarize this 200-page PDF", model="gemini-2.5-pro")

# Sub-agent 3: Data extraction → GPT-4o-mini (cheapest capable model)
spawn(task="Extract all email addresses from these HTML files", model="gpt-4o-mini")

Three sub-agents, three different models, three different providers — running in parallel, each using the model best suited to its task. The parent agent orchestrated the routing. The human approved the initial spawns. And the total cost is a fraction of running everything through a single flagship model.

The Approval Gate: Why Every Switch Requires Human Confirmation

Model switching is powerful — which is exactly why it requires human approval. The agent proposes; the human confirms. No silent provider changes, no undisclosed cost shifts, no unexpected model behavior.

This is the same HITL principle that gates destructive manage_mcp operations and credential deletions. The agent has the intelligence to make the routing decision — "this task needs a cheaper model" or "Anthropic is down, let's switch to Google" — but the human retains the authority to approve or reject.

The approval flow:

  1. Agent calls switch_model(action="switch", model="...")
  2. Human sees: "Facio wants to switch to claude-sonnet-4-7. Approve?"
  3. Human approves → switch takes effect immediately
  4. All subsequent messages in the session use the new model

If the human rejects or the review times out, the agent stays on its current model and can propose an alternative.

Provider Configuration: API Keys Without Agent Access

Multi-provider support means multiple API keys. Facio's credential store handles this without exposing any of them to the agent:

# Human configures keys once via /settings or ask_form
GOOGLE_API_KEY       → stored in credential store
OPENROUTER_API_KEY   → stored in credential store
ANTHROPIC_API_KEY    → stored in credential store

The agent sees that these providers are available (via switch_model(action="list")) but never sees the keys. It can route tasks across providers whose credentials it cannot access — the same credential-store architecture that keeps secrets out of agent context windows extends naturally to multi-provider routing.

OpenRouter and Custom Endpoints

Facio supports any OpenAI-compatible endpoint, which means:

  • OpenRouter — route to hundreds of models through a single API, including open-source models like Llama, Mistral, and Qwen
  • Self-hosted models — point Facio at a local vLLM or Ollama instance
  • Enterprise proxies — corporate API gateways that wrap internal models behind OpenAI-compatible interfaces

The agent doesn't distinguish between these and first-party providers. switch_model(action="switch", model="llama-4") works the same way as switching to GPT-4o — the runtime resolves the model name against the configured endpoints and routes accordingly.

This means an agent can use frontier models from Anthropic for reasoning, open-source models from OpenRouter for high-volume extraction, and a self-hosted model behind a corporate proxy for sensitive internal data — all in the same session, all behind the same approval gate.

Bottom Line

Single-provider lock-in made sense when AI agents were experiments. In production, it's a cost and reliability bottleneck. You shouldn't pay flagship-model prices for simple extraction tasks. You shouldn't go offline because one provider has an outage. And you shouldn't accept a model's weaknesses in areas where a different model is clearly stronger.

Facio's switch_model gives agents the ability to route tasks to the right model at the right time — with human approval on every switch. Combined with the credential store (for provider-agnostic key management) and spawn (for capability-aware parallel execution), it turns multi-provider from a configuration headache into a runtime capability.

The agent doesn't just use one model. It uses the right model for each job.


See the model switching documentation for provider configuration, fuzzy matching rules, and combined routing patterns with spawn.

Keep reading

More on Product

View category